AI models will become smaller and faster
They will improve in plenty of other ways, too
By Abby Bertics
Interest in artificial intelligence (AI) reached fever pitch in 2023. In the six months after OpenAI’s launch in November 2022 of ChatGPT, the internet’s most famed and effective chatbot, the topic “artificial intelligence” nearly quadrupled in popularity on Google’s search engine. By August 2023, one third of respondents to the latest McKinsey Global Survey said their organisations were using generative ai in at least one capacity.
How will the technology develop in 2024? There are three main dimensions on which researchers are improving AI models: size, data and applications.
Start with size. For the past few years, the accepted dogma of AI research has been that bigger means better. Although computers have got smaller even as they have become more powerful, that is not true of large language models (LLMs), the size of which is measured in billions or trillions of “parameters”. According to SemiAnalysis, a research firm, GPT-4, the LLM which powers the deluxe version of ChatGPT, required more than 16,000 specialised GPU chips and took multiple weeks to train, at a cost of more than $100m. According to Nvidia, a chipmaker, inference costs—getting the trained models to respond to users’ queries—now exceed training costs when deploying an LLM at any reasonable scale.
As ai models transition to being commercial commodities there is a growing focus on maintaining performance while making them smaller and faster. One way to do so is to train a smaller model using more training data. For instance, “Chinchilla”, an LLM developed in 2022 by Google DeepMind, outperforms OpenAI’s GPT-3, despite being a quarter of the size (it was trained on four times the data). Another approach is to reduce the numerical precision of the parameters that a model comprises. A team at the University of Washington has shown that it is possible to squeeze a model the size of Chinchilla onto one GPU chip, without a marked dip in performance. Small models, crucially, are much less expensive to run later on. Some can even run on a laptop or smartphone.
Next, data. AI models are prediction machines that become more effective when they are trained on more data. But focus is also shifting from “how much” to “how good”. This is especially relevant because it is getting harder to find more training data: an analysis in 2022 suggested that stocks of new, high-quality text might dry up in the next few years. Using the outputs of the models to train future models may lead to less capable models—so the adoption of LLMs makes the internet less valuable as a source of training data. But quantity isn’t everything. Figuring out the right mix of training data is still much more of an art than a science. And models are increasingly being trained on combinations of data types, including natural language, computer code, images and even videos, which gives them new capabilities.
What new applications might emerge? There is some “overhang” when it comes to ai, meaning that it has advanced more quickly than people have been able to take advantage of it. Showing what is possible has turned into figuring out what is practical. The most consequential advances will not be in the quality of the models themselves, but in learning how to use them more effectively.
At present, there are three main ways to use models. The first, “prompt engineering”, takes them as they are and feeds them specific prompts. This method involves crafting input phrases or questions to guide the model to produce desired outputs. The second is to “fine-tune” a model to improve its performance at a specific task. This involves giving a pre-existing model an extra round of training using a narrow dataset tailored to that task. For instance, an LLM could be fine-tuned using papers from medical journals to make it better at answering health-related questions. The third approach is to embed LLMs in a larger, more powerful architecture. An LLM is like an engine, and to make use of it for a particular application, you need to build the car around it.
One example of this is “retrieval augmented generation”, a technique that combines an LLM with extra software and a database of knowledge on a particular topic to make it less likely to spit out falsehoods. When asked a question, the system first searches through its database. If it finds something relevant, it then passes the question, along with the factual information, to the LLM, requesting that the answer be generated from the information supplied. Providing sources in this way means users can be more confident of the accuracy of responses. It also allows the LLM to be personalised, like Google’s NotebookLM, which lets users supply their own databases of knowledge.
Amid all the focus on AI’s commercial potential, the hunt for artificial general intelligence continues. LLMs and other forms of generative ai may be a piece in the puzzle, or a step on the way, but they are probably not the final answer. As Chris Manning of Stanford University puts it: there is “no reason to believe…that this is the ultimate neural architecture, and we will never find anything better.” ■
Abby Bertics, Science correspondent, The Economist
Explore more
This article appeared in the Science and technology section of the print edition of The World Ahead 2024 under the headline “What’s next for AI research?”