Small Language Models Are Gaining Popularity Among Researchers

Large language models (LLMs) like those from OpenAI, Meta, and DeepSeek are powerful but come with significant costs. These models, which use hundreds of billions of parameters, require massive computational resources for training and operation. For example, Google reportedly spent $191 million to train its Gemini 1.0 Ultra model. Additionally, a single ChatGPT query consumes about 10 times more energy than a Google search, according to the Electric Power Research Institute.

In response, researchers are turning to small language models (SLMs). Companies like IBM, Google, Microsoft, and OpenAI have released SLMs with just a few billion parameters. While these models are not as versatile as LLMs, they excel in specific tasks such as summarizing conversations, serving as health care chatbots, and collecting data in smart devices. Zico Kolter, a computer scientist at Carnegie Mellon University, noted that "For a lot of tasks, an 8 billion–parameter model is actually pretty good." SLMs can also run on devices like laptops or cell phones, reducing reliance on large data centers.

Training and Optimizing Small Models

To make SLMs effective, researchers use techniques like knowledge distillation, where large models generate high-quality training data for smaller ones. This method ensures that SLMs learn from cleaner, more organized data. Another approach is pruning, which involves removing unnecessary parts of a neural network to improve efficiency. Pruning was inspired by the human brain and was first proposed by Yann LeCun in a 1989 paper. This technique allows researchers to fine-tune SLMs for specific tasks or environments.

Advantages of Small Models

SLMs offer several benefits:

Cost-Effectiveness: They require fewer computational resources, making them cheaper to train and deploy.
Transparency: With fewer parameters, their reasoning processes are easier to understand.
Experimentation: Researchers can test new ideas with lower stakes, as noted by Leshem Choshen of the MIT-IBM Watson AI Lab.

While LLMs will remain useful for tasks like generalized chatbots and drug discovery, SLMs provide a practical alternative for targeted applications. As Choshen put it, "These efficient models can save money, time, and compute."

Original story reprinted with permission from Quanta Magazine.

Small Language Models Are Gaining Popularity Among Researchers

Training and Optimizing Small Models

Advantages of Small Models

Related News

Self-learning AI Agents Transform Enterprise Operations

Kaizen AI Generators Power Continuous Improvement in Tech

About the Author

Dr. Lisa Kim

Expertise

Training and Optimizing Small Models

Advantages of Small Models

Related News

Self-learning AI Agents Transform Enterprise Operations

Kaizen AI Generators Power Continuous Improvement in Tech

About the Author

Dr. Lisa Kim

Expertise

Agent Newsletter

Get Agentic Newsletter Today