When It Comes to AI Models, Bigger Isn’t Always Better

Artificial intelligence models are getting bigger, along with the data sets used to train them. But scaling down could solve some big AI problems

By Lauren Leffer

Brain image on laptop screen downloading images from the cloud — Malte Mueller/Getty Images

Artificial intelligence has been growing in size. The large language models (LLMs) that power prominent chatbots, such as OpenAI’s ChatGPT and Google’s Bard, are composed of well more than 100 billion parameters—the weights and variables that determine how an AI responds to an input. That’s orders of magnitude more information and code than was common among the most advanced AI models just a few years ago.

In broad strokes, bigger AI tends to be more capable AI. Ever larger LLMs and increasingly massive training datasets have resulted in chatbots that can pass university exams and even entrance tests for medical schools. Yet there are drawbacks to all this growth: As models have gotten bigger, they’ve also become more unwieldy, energy-hungry and difficult to run and build. Smaller models and datasets could help solve this issue. That’s why AI developers, even at some of the largest tech companies, have begun to revisit and reassess miniaturized AI models.

In September, for instance, a team of Microsoft researchers released a technical report on a new language model named phi-1.5. Phi-1.5 is made up of 1.3 billion parameters, which is about one one-hundredth the size of GPT-3.5, the model that underlies the free version of ChatGPT. GPT-3.5 and phi-1.5 also share the same general architecture: they are both transformer-based neural networks, meaning they work by mapping the context and relationships of language.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

But despite its relatively diminutive size, phi-1.5 “exhibits many of the traits of much larger LLMs,” the authors wrote in their report, which was released as a preprint paper that has not yet been peer-reviewed. In benchmarking tests, the model performed better than many similarly sized models. It also demonstrated abilities that were comparable to those of other AIs that are five to 10 times larger. And recent updates made in October even allow phi-1.5 to display multimodality—an ability to interpret images as well as text. Last week Microsoft announced the release of phi-2, a 2.7-billion-parameter follow-up to phi-1.5, which demonstrates even more ability in a still relatively compact package, the company claims.

Make no mistake, massive LLMs such as Bard, GPT-3.5 and GPT-4 are still more capable than the phi models. “I would say that comparing phi-1.5 to GPT-4 is like comparing a middle school student and an undergraduate student,” says Ronen Eldan, a principal AI researcher at Microsoft Research and one of the authors of the September report. But phi-1.5 and phi-2 are just the latest evidence that small AI models can still be mighty—which means they could solve some of the problems posed by monster AI models such as GPT-4.

For one, training and running an AI model with more than 100 billion parameters takes a lot of energy. A standard day of global ChatGPT usage can consume as much electricity as about 33,000 U.S. households do in the same time period, according to one estimate from University of Washington computer engineer Sajjad Moazeni. If Google were to replace all of its users’ search engine interactions with queries to Bard, running that search engine would use as much power as Ireland does, according to an analysis published last month in Joule. That electricity consumption comes, in large part, from all the computing power required to send a query through such a dense network of parameters, as well as from the masses of data used to train mega models. Smaller AI needs far less computing power and energy to run, says Matthew Stewart, a computer engineer at Harvard University. This energy payoff is a sustainability boost.

Plus, less resource-intensive AI is more accessible AI. As it stands now, just a handful of private companies have the funds and server space to build, store, train and modify the biggest LLMs. Smaller models can be developed and studied by more people. Thinking small “can in some sense democratize AI,” says Eva Portelance, a computational and cognitive linguistics researcher at the Mila-Quebec Artificial Intelligence Institute. “In not requiring as much data and not requiring the models to be as big..., you’re making it possible for people outside of these large institutions” to innovate. This is one of multiple ways that scaled-down AI enables new possibilities.

For one thing, smaller AI can fit into smaller devices. Currently, the size of most LLMs means they have to run on the cloud—they’re too big to store locally on an unconnected smartphone or laptop. Smaller models could run on personal devices alone, however. For example, Stewart researches so-called edge computing, in which the goal is to stuff computation and data storage into local machines such as “Internet of Things” gadgets. He has worked on machine-learning-powered sensor systems compact enough to run on individual drones—he calls this “tiny machine learning.” Such devices, Stewart explains, can enable things like much more advanced environmental sensing in remote areas. If competent language models were to become similarly small, they would have myriad applications. In modern appliances such as smart fridges or wearables such as Apple Watches, a smaller language model could enable a chatbotesque interface without the need to transmit raw data across a cloud connection. That would be a massive boon for data security. “Privacy is one of the major benefits,” Stewart says.

And although the general rule is that larger AI models are more capable, not every AI has to be able to do everything. A chatbot inside a smart fridge might need to understand common food terms and compose lists but not need to write code or perform complex calculations. Past analyses have shown that massive language models can be pared down, even by as much as 60 percent, without sacrificing performance in all areas. In Stewart’s view, smaller and more specialized AI models could be the next big wave for companies looking to cash in on the AI boom.

Then there’s the more fundamental issue of interpretability: the extent to which a machine-learning model can be understood by its developers. For larger AI models, it is essentially impossible to parse the role of each parameter, explains Brenden Lake, a computational cognitive scientist researching artificial intelligence at New York University. This is the “black box” of AI: developers build and run models without any true knowledge of what each weight within an algorithm accomplishes. In smaller models, it is easier, though often still difficult, to determine cause and effect and adjust accordingly. “I’d rather try to understand a million parameters than a billion parameters,” Lake says.

For both Lake and Portelance, artificial intelligence isn’t just about building the most capable language model possible but also about gaining insight into how humans learn and how we can better mimic that through machines. Size and interpretability are key factors in creating models that help illuminate things about our own mind. With mega AI models—generally trained on much bigger datasets—the breadth of that training information can conceal limitations and make it seem like an algorithm understands something it doesn’t. Conversely, with smaller, more interpretable AI, it is far easier to parse why an algorithm is producing an output. In turn, scientists can use that understanding to create “more cognitively plausible” and possibly better overall AI models, Portelance says. Humans, they point out, are the gold standard for cognition and learning: we can absorb so much and infer patterns from very small amounts of information. There are good reasons to try to study that phenomenon and replicate it through AI.

At the same time, “there are diminishing returns for training large models on big datasets,” Lake says. Eventually, it becomes a challenge to find high-quality data, the energy costs rack up and model performance improves less quickly. Instead, as his own past research has demonstrated, big strides in machine learning can come from focusing on slimmer neural networks and testing out alternate training strategies.

Sébastien Bubeck, a senior principal AI researcher at Microsoft Research, agrees. Bubeck was one of the developers behind phi-1.5. For him, the purpose of studying scaled-down AI is “about finding the minimal ingredients for the sparks of intelligence to emerge” from an algorithm. Once you understand those minimal components, you can build on them. By approaching these big questions with smaller models, Bubeck hopes to improve AI in as economical a way as possible.

“With this strategy, we’re being much more careful with how we build models,” he says. “We’re taking a slower and more deliberate approach.” Sometimes slow and steady wins the race—and sometimes smaller can be smarter.