Young Man Shouts at Computer

In which I ramble about large language models.

In recent months, many promises of a great future have been made. Better yet, these seeds have started to germinate, and fresh sprouts poke up through the soil, as spring arrives. Many of us are watching in anticipation, already imagining how these plants might flourish, and what fruit they might bear.

And yet, I feel a lingering sense of fear.

There are of course fabulous stories about how the world might end in spectular ways as hyper-intelligent beings find ways of destroying us so fantastical that we can’t even comprehend them; I don’t really buy into that.

No, instead my concern is a more mundane one. I fear that we’re creating a world that we understand less, in which mystical powers are given to us, but in a way that makes us helpless. I fear that we’re letting a world we can’t understand be fashioned haphazardly, and making ourselves obsolete in the process.

More than anything, I fear that I’m worth less than I once was.

There was a time in my life where I was much more literary. In high school, I focused a lot more on literature and the arts than on math and science. Like with many things in life, I wanted to imitate the influences around me. I tried my hand at poetry for a time, but after a while I came away with the conclusion that I wasn’t much more than a poor imitator of the writers that I admired: a hack.

Many people have said, of these LLMs, that while they can produce poetry, it’s “merely imitation”, and they have no creativity of their own. Nonetheless, if ChatGPT is merely a hack, it’s certainly a better hack than I ever was.

In that time, I had to write a lot more essays than I do now (at least one). Ironically, I think I might have gotten better at writing essays by not worrying about it too much, but maybe talking about how good your writing is will just bring attention to how bad it is instead. Regardless, I’m certainly making an effort. But is there really a point? I can try and wax lyrical about how LLMs aren’t capable of writing a creative essay, but I don’t even believe that to be true; who knows what they’ll be writing in years, months, days?

I’ve come to wonder what the point is in writing at all anymore. I guess it’s worth it while you still can.

Writing is, before anything else, a tool. People write because they have to. They have a message to communicate to other people. Beyond that, writing is also a way to signal other things about that message. The way you express yourself can be much more important than the face-level message you’re trying to convey. Vibes matter.

Because most writing is functional, automating this process can naturally be seen as a great thing. You save people time, and enable them to produce text with a large economy of effort. Most likely, the quality of their output can be improved as well, or at least its variety. It’s quite easy to ask ChatGPT to output text in a particular style, and it’s not that bad at doing it either.

Similarly, reading is, before anything else, a tool. People read because they have to. They have a message they need to understand, transmitted from other people. In this domain, you can also use LLMs, to summarize text, making the task of reading easier.

This brings us to the somewhat funny proposition that a person might communicate an idea to another by asking ChatGPT to expand the idea into a large amount of text, and then compress the text back into a short idea, and then communicate that idea. Presented this way, it seems very silly, because you could just communicate the initial idea you had, but this future seems to literally be present, as seen in a product announcement for a GPT powered personal assistant: you can have it expand ideas into e-mails, and summarize e-mails back into ideas.

In an optimistic future, these models are the bridge between the fuzzy world of human communication, and the cold and calculating world of computers. Humans communicate in all sorts of fuzzy ways, and these models have the novel ability to trim through the fuzz. You can distill vague text messages about when to meet into a specific date and time that can be parsed by calendar software. Conversely, a raw set of numbers can be expanded into a warm invitation to meet at a certain time.

In this future, LLMs are well-understood tools for saving humans precious time, a welcome copilot automating away the drudgery of the business world and of life writ large.

At this point, I had started writing a section about my main critiques of this vision; it was too objective. I don’t think this is the right piece for that, not because I don’t think that objective critique isn’t important, but because I’d be dishonest in pretending this writing is objective. I’m writing about my feelings, in a stream-of-thought fashion, and trying to lay out a series of technical points is not befitting of either this work or those points.

My fear is that our use of these models will create a world in which humans are stripped of large amounts of agency over their world. I think we’re letting a world be created in which we have both less control and less understanding of how the systems around us operate, and ossify the worst aspects of our bureaucratic systems.

In some sense, the fact that this vision of the world is how people see automation going is a failure of two things: first, the vision of classical AI, and second, the vision of classical software as a whole.

For my ramblings to make sense, I need to lay out a bit of what I see as “computing”, or “classical software”. At face-value, computing is about instructing computers to perform certain tasks, except that these “computers” can be entirely fictional. You can imagine a hypothetical model of a computer, and then writ hypothetical programs and imagine what they do. Written down, it sounds like computer scientists are insane; maybe they are. Beyond that, the most important aspect about computing is that it requires precisely understanding what it is you’re doing, before describing in minute detail how to do it, in a way that your hypothetical computer can process. Now, the beauty of computing is that meta-programming is inherent to it. You don’t have to restrict yourself to the most basic language of your imaginary computer. You can instead describe a higher level language, simplifying or adding shorthand for various aspects of the low level language. You can then bridge between the two by providing a precise semantics for the new language. In other words, you give meaning to the various new constructs and shorthand in the basic low level language.

In this way, you create a growing set of abstractions, letting you build off of your own work to recursively self improve at your task of programming.

Now, I can pretend that all of this business can be efficiently conducted from the rocking chair, but that would be lying to myself. The development of computer science, past a certain point, necessitates the invention of actual computing machines. The reason for this is that at very quickly becomes intractable for a single human to follow and remember all of the various instructions they’ve laid out. It’s very easy for humans to write programs so complicated that they can’t predict what they’ll do until they actually run them. This is the case for most programs, and basically every computer bug arises from a mismatch between the programmer’s mental model of what a program does, and the reality, as demonstrated by a physical computer, of what the program actually does.

In this way, programming is like a melding of minds between man and machine. The programmer encodes their knowledge of the world into a program. They represent objects and relationships, as they understand them, as types and functions in their programming language of choice. Over time, their program grows far beyond the point that they can keep it in their own head, and so they develop a restricted mental model of how the code is organized, and what it does. Thankfully, a very powerful computer (relatively speaking) is able to check their work, and actually run and even verify the correctness of this program.

In theory, if you understand what needs to be done correctly, anything can be automated using software, provided the computer running this software has enough tools connecting them to the outside world. If you develop a precise understanding of how to cook eggs, a computer connected to a frying pan can reify this understanding.

This is also the fundamental flaw of classical software: you can’t do what you don’t understand. Now, you can rely on the understanding of others. If someone else has figured out how to make toast, you can connect your computer to a toaster and prepare a complete breakfast. We struggle to apply software in many domains because we simply don’t understand how to even solve problems in these domains ourselves.

The irony is that there are many tasks that we don’t know how to solve, but are nonetheless capable of doing. I know how to type, but I would find it very difficult to program my own muscles to be trigger in the precise order required for me to type. I just think “I want to type this sentence”, and it happens. In fact, the interesting thing about muscle memory is that you very quickly develop abstractions over your own use of muscles. You can move your limbs at will, but when you’re walking, you don’t really think of the precise order you need to move the muscles in your legs: you just walk.

This leads to the second major flaw in classical software, which is perhaps even more of a death sentence: our world is heavily reliant on things we know how to do but don’t understand. This isn’t even necessarily very pernicious. Before computers, a lot of stuff got done by passing around documents with stuff written on them, and we don’t quite understand how to go from a picture of text describing something to a symbolic model of that thing. Because so much of the world depends on fuzzy processes, requiring both the innate processing capacities humans are endowed with, and often functioning without a precise or consistent set of rules, we’ve struggled to automate many sectors of society which in theory would be perfectly amenable to computers.

In fact, most automation I see proposed with novel AI techniques could instead be done, at least to 90% of the quality, by combining classical software and a slight change in human behavior. This is analogous to how manufacturing has automated: we haven’t created bipedal automata which know how to fashion objects by hand like humans do, instead we’ve created molds, conveyor belts, and otherwise reorganized the factory around predictable capabilities we can design machines to have.

This limitation of classical software, that it requires understanding the problem being solved, was also the main limitation of our initial approach to AI. Basically, we failed because we tried to develop an understanding of how intelligence works, and then use that to develop intelligent programs. Actually, we didn’t quite fail, but in so far as we succeeded, all of our successes were rebranded to not be “AI”. A compiler was basically an AI application at its inception, but isn’t quite considered that way anymore. Logic programming was a big aspect of the first AI summer, with languages like Prolog taking the spotlight. The way we thought the future would work back in the 80s was that you would encode a bunch of facts about the world into a knowledge database, and then you would use Prolog to come to logical conclusions about the world. For example, you could encode facts about medicine, and then make queries for medical diagnoses, matching symptoms to diseases. This didn’t quite pan out, as the distinct lack of Prolog and expert systems 40 years later can attest to.

Our current approach to AI is very different. Basically, in 2023, the most promising approach we have is to take a very large statistical model which has learned to predict text, and then make it very very large, so it’s really good at doing that, and then add extra tricks like fune tuning it on a particular data set, giving it some feedback on how to respond to certain queries, and then augmenting queries with extra context, in order to give us a nice black box which responds to questions with answers. In the early 2010s we kind of laughed at ourselves back in the 80s, realizing that all of our fancy classical AI techniques would be made obsolete by throwing more compute at the problem, as the rise of deep learning showed. 10 years later, we can laugh at ourselves once again, as all the fancy architectures for various neural networks and hyper parameter optimization seem to be basically moot, because scale wins at the end of the day. Yeah sure you can argue about Relu or Tanh or whatever, but if you just get more data and a bigger model and more training it doesn’t matter. A fad a few years ago was “geometric deep learning”, which tried to reflect data symmetry in terms of the structure of the model itself. It seems that large language models are slowly developing understanding of images, without the need for special symmetry considerations in their architecture. Once again, it seems that scale is all that matters.

Somewhat ironically for people who think that expert systems weren’t that bad of an idea (I count myself among these people), is that many people seem to treat LLM query boxes, like ChatGPT, as a kind of expert system. They expect ChatGPT to know various facts, or even be able to conduct accurate calculations, and one complaint I’ve seen a few times is that the training data is unfortunately dated, so recent events can’t be seen in answers. Now, ChatGPT is better than expert systems in that it’s capable of both parsing queries written in English (or other languages, it would seem), and generating replies in that same language (or even other languages). This is a huge step beyond what expert systems could ever do.

There is however a problem, which is that there’s no guarantee of it actually giving you the right answer. Now, the answer is usually plausible, or at least, it looks the same, in a fuzzy way, as a valid answer will. That is, in essence, the benchmark the model is being trained for: can it generate plausible answers to queries. In many areas, plausibility is more important than “correctness”: often, this is because “correctness” is not even definable for that domain. What is the “correct” fictional Seinfeld episode about Kramer getting a pet zebra?

The great irony here is that we’ve managed to create a system capable of great things, but in a way where don’t understand how it actually does those things, or what its limitations are. On one hand, this should make us hopeful. If we had the ability to put a human mind in a box—ethical considerations aside—then being able to make millions of copies of that mind, and ask it to perform various tasks would be incredibly useful. It would be like having a personal assistant, but one that never complains or gets tired, and should, in theory, be able to automate many tasks you’d otherwise have to do yourself. The issue is that we don’t quite know if the models we’ve created are “minds”, let alone human. In fact, the failure modes of these models, like all large ML models, are very esoteric, and quite different from the failure modes that humans exhibit. Because of this, I’m very skeptical of the application of these models in domains where correctness matters a lot. That said, it might actually be more efficient to generate a bunch of bad output, as long as there’s a human in the loop able to check the output of these models.

To digress a bit, in such a world, you basically have the death of the “junior position”. All junior positions are instead replaced with automated models churning out usually ok material, but that sometimes needs correcting (which is identical to most senior staff’s mental model of what junior staff are capable of). Someone actually knowledgeable about how things work is able to separate the wheat from the chaff and keep the whole ship chugging along. Crucially, you need someone actually capable in the loop, but the end result may be a lot more efficient, with simple tasks being carried out effectively. The end result also leaves open the question of how to actually create senior staff, if there’s no employment prospects for humans that need experience. Well, that’s somebody else’s problem, isn’t it.

A solution to these shortcomings that would satisfy me would be the relegation of models to domains where fuzziness is required, but also bounded. Basically, you would use these models to do hard things like understand natural language queries, but you would want them to produce structured output as soon as possible, so that actual programs that you understand would be capable of processing them. It seems silly to expect large language models to be good at doing math, but you might be able to make them good at translating language problems into symbolic math problems, at which point we understand very well how to solve them. I see a lot of benefit in the synthesis of traditional software, and LLMs, as a way to sort of bridge the gap between the fuzzy and the sharp. Naturally, many of the domains in which this could be useful are also amenable to the computerized compromise where humans slightly modify their behavior to allow traditional software to work correctly, or are even simply amenable to just using software if we could get our act together, and actually improve the status quo.

Nonetheless, it seems that this approach isn’t all that popular, or at least, it’s a bit of an uphill battle, because any kind of structured composition of LLMs and software seems to get usurped by people expecting the black box to just be able to do everything on its own, and building products that try and function in that way. Some of OpenAI’s recent works on adding so-called “plugins” to ChatGPT are interesting towards this goal, but also illustrate the weird new world we found ourselves in.

To digress a bit, the way plugins are integrated, reading the documentation, is that you provide a combination of a structured description of an HTTP API, along with natural language descriptions of the various endpoints. Under the hood, ChatGPT is outputting descriptions of HTTP requests based on this information, I presume. (As always, OpenAI is rather tight-lipped). This is, well, not the first way I would have thought of doing this kind of fuzzy bridging.

In some sense, I feel very disempowered by things like this, in that I’m losing a sense of how to actually build upon great advancements in the world. You develop a kind of innate sense of what should and shouldn’t work over time, in software, and this sense seems to have been upended by mystical powers some systems have gained overnight. Yet, these powers are quite unpredictable, so it always feels like you’re walking on a tightrope, unsure if whether your next move will make you fall to the ground.

I feel a general sense of malaise as we move towards a future in which large opaque models govern so many systems around us. With traditional software, we were at least trying to develop an understanding of the world, with effort, one could develop predictable models of what one was doing, how systems behaved. It seems that we’re moving towards a model of programming in which we conjure up new programs, which may or may not do what we want, and programming becomes a matter of validating the side effects of a conjuration.

Maybe this is just sour grapes on my part, and this is the feeling “normal” people have had towards technology this whole time. Maybe technology has always felt out of reach, opaque, bizarre, and something that “other people do”. I think we’ve done a bad job at making “personal” computing happen, and have slowly alienated people from the computing devices they use every day.

Most computers are now merely portals into discrete application “experiences”, with little ability to modify, or compose the applications in novel ways. There’s a kind of meme about zoomers not even knowing what a file system is, what’s worse is that they don’t even know why they might want a file system. There used to be a time in which interoperability was a key feature of many apps. Documents could be edited in various pieces of software, each fit for a different purpose. At some point in time, we thought people might even learn some kind of programming themselves, in order to model theiwr world, and automate their own tasks. In this way, computing was about extending one’s own personal capabilities.

Part of this model lives in, and according to some, is even revived in the advent of large language models. Now that software is almost a commodity, the era of personal applications is here. The qualm I have with this is that a conjured piece of software does not effectively function as an extension of one’s own capabilities. Rather, it is a magical piece of technology bestowed upon you by someone else, which might work, but works without your agency or your understanding. It provides you with more capabilities, but it does not extend your own capabilities.

To illustrate, when you learn to ski, eventually you forget about the big planks of wood attached to your feet, and the movements you learned become natural extensions of your own. Sliding down the slopes feels like a movement as natural to you as walking. This is what I mean by an extension of your capabilities.

By contrast, the conjuration of software is more like the use of a chair lift. The chair lift carries you, enabling you to get up the mountain efficiently, but using it is basically like being carried up by someone else, and most people have little effective understanding of how a chair lift operates. (Maybe a chair lift is a bad example because it’s actually relatively simple, mechanically. Pretend that it’s a chair lift powered by fusion or something.)

Anyhow, this post is more technical than I wanted it to be anyways, so I’ll conclude. Basically, I’m afraid that our current approach to AI is much more disempowering than most people think, and rather than shifting the balance back towards personal computing, we’re moving even further towards a form of alienated computing, wherein computing is something other people do that impacts you, rather than something you can fully participate in.

Even graver, I fear that we might make it so that we’re entrenching ourselves in so much automated muck that one has no choice but to use even more automation to get anything done, creating more muck in the process. Basically, rather than organizing the world and trying to understand it, we give up and let brute force navigate the fuzz instead. In the short run, this yields great results, even better than I would have guessed, but in the long run, I can’t help but feel a profound sense of malaise about what’s to come.