[AARR] From Few to Many Shots: Exploring the Depths of In-Context Learning

[AARR] From Few to Many Shots: Exploring the Depths of In-Context Learning

[AARR] From Few to Many Shots: Exploring the Depths of In-Context Learning

Author

Align AI R&D Team

Author

Author

Author

Align AI R&D Team

Align AI R&D Team

Align AI R&D Team

💡 The Align AI Research Review is a weekly dive into the most relevant research papers, AI industry news and updates!


From Few to Many Shots: Exploring the Depths of In-Context Learning with Breakthroughs, Novelties, and Future Directions

Large Language models (LLMs) are progressively permeating our daily lives and various applications. Traditionally, researchers have approached the customization of LLMs from two main perspectives, each with its own set of benefits and drawbacks (Vaswani et al., 2017).

The process starts with fine-tuning (or additional training) a pre-trained base model. Although not as computationally intensive as initial training, this process does require resources and demands acquiring a reasonable volume of data, as well as some framework to run this training and then host the fine-tuned models. However, effectively ensuring that the model learns the algorithm becomes a challenge when dealing with complex algorithmic tasks.

The “Many-Shot In-Context Learning” Google DeepMind paper talks about research mainly contributed to In-context learning (ICL) with significant emphasis on the LLM space. According to its findings, a model can learn from a few examples without requiring weight adjustments, which is remarkable.

The main advantage of ICL is its significant enhancement in LLM performance across various tasks, including algorithmic reasoning, translation, interpretation, organization, reward modeling, and mathematical problem-solving.

When discussing LLMs, they often use few-shot or zero-shot learning as an ingredient. Simply conditioning the models on a few examples (few-shot) or task-specific instructions (zero-shot) and describing the task can resolve a wide range of tasks.

Additionally, while LLMs have made significant progress in handling long sequences exceeding 32K tokens, researchers have primarily evaluated their performance using metrics like perplexity and synthetic tasks. However, these metrics may not fully capture their abilities in more complex, real-world situations. The current limit for public models is 200k (Anthropic) compared to 128k (GPT-4 Turbo).

Compared to fine-tuning, ICL mode facilitates task-specific parameter adaptation at the time of inference and requires no additional parameter optimization. However, inference costs and execution time may remain an issue.

On the other hand, as the number of tokens in the context windows of LLMs increases to millions, the models acquire additional properties. Thus, researchers from Google DeepMind conducted an experiment involving the study “ ****Many-Shot In-Context Learning.”

Today's research paper analyzes many-shot in-context learning, where LLMs are introduced with hundreds or thousands of samples at inference in order to learn new tasks.


Many-shot vs Few-Shot in Context Learning (ICL)

The study conducted by (Agarwal et al., 2024) revealed that the model’s performance on tasks that typically costly and challenging to resolve with few-shot ICL can be further enhanced by fitting hundreds and thousands of ICL examples into the prompt.

The small number of examples that can be fit in the context window has, nevertheless, hindered the performance of ICL. As a result, ICL can now be analyzed with thousands of samples, a process referred to as the “many-shot regime.”

Now, let’s discuss the transformative approaches that research talks about, particularly in the context of Align AI's expertise and focus.


Transformative approaches to minimize Human-Generated Data

Although many-shots show considerable potential, their performance may be limited due to the higher-quality outputs generated by humans. Additionally, this drawback becomes more acute in advanced reasoning tasks like GPQA.

To handle these issues, the author presented reinforced ICL and unsupervised ICL inspired by the work of (Singh et al., 2023), based on the efficiency of model-generated solutions for fine-tuning.

  • Reinforced ICL: generates reasoning chains using LLMs compared to human responses, enabling the development of ICL examples for problems with limited availability of human-annotated data.

  • Unsupervised ICL: involves inputting problems directly into the model rather than problem-solution pairs. Both methodologies demonstrate efficacy in the many-shot regime, specifically when applied to complex reasoning tasks, creating novel opportunities for exploiting the capabilities of many-shot ICL.

Reinforced ICL employing model-generated rationales and unsupervised ICL utilizing only problems outperform the use of human-written solutions in problem-solving on MATH and GSM8K.

The research highlights both pros and cons of model behavior:

Pros:

  • Another contribution of many-shot ICL is its capacity to modify the model's learned biases.

  • The research findings revealed that many-shot ICL can overcome pre-training biases, which is particularly beneficial for handling "non-natural" NLP prediction tasks.

Cons:

  • The research conducts all its investigations on a single model, including Gemini 1.5 Pro.

  • Moreover, it struggles to fully understand why performance can sometimes deteriorate over more samples in the prompt.


Final Thoughts

Overall, the blog post effectively discussed what Many-Shots In-Context Learning is and what problem it solves. How do reinforced ICL and unsupervised ICL solve the problem?

In a nutshell, the study's findings hold substantial performance improvements for the future of ICL and LLMs. Many-shot learning's potential to overcome relate to biases and learn high-dimensional data with numerical inputs may have far-reaching consequences. On the other hand, the insufficiency of next-token prediction loss as a metric for assessing the performance of subsequent ICLs underscores the need for further investigation in this domain.


Learn More!

To explore deeper into the materials of this blog post, we share a few resources below:

  • Detailed overview of the blog post (Link)

  • Learn more about Gemini 1.5 (Link)

  • Get started with the Few-short Learning with Language Models (Link)


Interested in building the future of interaction analytics?

View open positions →

Want to ensure that your AI chatbot

is performing the way it is supposed to?

Want to ensure that your AI chatbot is performing the way it is supposed to?

Want to ensure that your AI chatbot

is performing the way it is supposed to?

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi