Large language models’ ability to generate text also lets them plan and reason
What will come next?
Quantum physics as a Shakespearean sonnet. Trade theory explained by a pirate. A children’s story about a space-faring dinosaur. People have had fun asking modern chatbots to produce all sorts of unusual text. Some requests have been useful in the real world—think travel itineraries, school essays or computer code. Modern large language models (LLMs) can generate them all, though homework-shirkers should beware: the models may get some facts wrong, and are prone to flights of fancy that their creators call “hallucinations”.
Occasional hiccups aside, all this represents tremendous progress. Even a few years ago, such programs would have been science fiction. But churning out writing on demand may not prove to be LLMs’ most significant ability. Their text-generating prowess allows them to act as general-purpose reasoning engines. They can follow instructions, generate plans, and issue commands for other systems to carry out.
After all, language is not just words, but “a representation of the underlying complexity” of the world, observes Percy Liang, a professor at the Institute for Human-Centred Artificial Intelligence at Stanford University. That means a model of how language works also contains, in some sense, a model of how the world works. An LLM trained on large amounts of text, says Nathan Benaich of Air Street Capital, an AI investment fund, “basically learns to reason on the basis of text completion”.
Systems that use LLMs to control other components are proliferating. For example, HuggingGPT, created at Zhejiang University and Microsoft Research, uses ChatGPT as a task planner, farming out user requests to AI models selected from Hugging Face, a library of models trained for text, image and audio tasks. TaskMatrix.AI, created by researchers at Microsoft, features a chatbot that can interact with music services, e-commerce sites, online games and other online resources.
PaLM-E, created by researchers at Google, uses an “embodied” LLM, trained using sensor data as well as text, to control a robot. It can understand and carry out tasks such as “bring me the rice chips from the drawer” or “push the red blocks to the coffee cup.” Auto-GPT, created by Toran Bruce Richards of Significant Gravitas, a startup, uses GPT-4 to generate and develop business ideas by knitting together a range of online resources. And so on.
The prospect of connecting LLMs to real-world contraptions has “the safety people freaking out”, Mr Benaich says. But making such systems safer is the focus of much research. One hope is that LLMs will have fewer hallucinations if they are trained on datasets combining text, images and video to provide a richer sense of how the world works. Another approach augments LLMs with formal reasoning capabilities, or with external modules such as task lists and long-term memory.
Observers agree that building systems around LLMs will drive progress for the next few years. “The field is very much moving in that direction,” says Oren Etzioni of the Allen Institute for AI.
But in academia, researchers are trying to refine and improve LLMs themselves, as well as experimenting with entirely new approaches. Dr Liang’s team recently developed a model called Alpaca, with a view to making it easier for academic researchers to probe the capabilities and limits of LLMs. That is not always easy with models developed by private firms.
Dr Liang notes that today’s LLMs, which are based on the so-called “transformer” architecture developed by Google, have a limited “context window”—akin to short-term memory. Doubling the length of the window increases the computational load fourfold. That limits how fast they can improve. Many researchers are working on post-transformer architectures that can support far bigger context windows—an approach that has been dubbed “long learning” (as opposed to “deep learning”).
Hello Dave, you’re looking well today
Meanwhile, other researchers are looking to extend the capabilities of “diffusion” models. These power generative-AI models, such as Stable Diffusion, that can produce high-quality images from short text prompts (such as “An Economist cover on banking in the style of Dali”). Images are continuous, whereas text consists of discrete words. But it is possible to apply diffusion to text, says Dr Liang, which might provide another way to improve LLMs.
Amid the excitement Yann LeCun, one of the leading lights of modern AI, has sounded a sceptical note. In a recent debate at New York University, he argued that LLMs in their current form are “doomed” and that efforts to control their output, or prevent them making factual errors, will fail. “It’s not a popular opinion among my colleagues, but I don’t think it’s fixable,” he said. The field, he fears, has taken the wrong turn; LLMs are “an off-ramp” away from the road towards more powerful AI.
Such “artificial general intelligence” (AGI) is, for some researchers, a kind of holy grail. Some think AGI is within reach, and can be achieved simply by building ever-bigger LLMs; others, like Dr LeCun, disagree. Whether or not they eventually prove a dead end, LLMs have gone much further than anyone might have believed a few years ago, notes Mr Benaich. However you define AGI, AI researchers seem closer to it than they were a couple of years ago. ■
Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.
Explore more
This article appeared in the Science & technology section of the print edition under the headline "What comes next?"
From the April 22nd 2023 edition
Discover stories from this section and more in the list of contents
Explore the editionMore from Science and technology
A new type of jet engine could revive supersonic air travel
It would also be simpler and more fuel-efficient
Delivery robots will transform Christmas
Santa’s hi-tech little helpers
How scientists went to an asteroid to sample the Sun
...and how listening to its return helped prepare them for Venus