[AARR] Lamini - Memory Tuning

[AARR] Lamini - Memory Tuning

[AARR] Lamini - Memory Tuning

Author

Align AI R&D Team

Author

Author

Author

Align AI R&D Team

Align AI R&D Team

Align AI R&D Team

💡 The Align AI Research Review is a weekly dive into the most relevant research papers, AI industry news and updates


Banishing LLM Hallucinations Requires Rethinking Generalization

A Novel Approach to Reducing Hallucinations in Large Language Models Through Dynamic Memory Expert Selection.

(Li et al., 2024) introduced a new model architecture named Lamini-1, which uses a system of memory experts to precisely store facts and dynamically retrieve them as needed.

The main goal of this research is to minimize hallucinations, which it claims can reduce from 50% to 5%, a significant reduction. Hallucination is a problem where LLMs produce information not included in their training data.

Compared to conventional methods, this architecture enables more efficient training and better factual recall, all while significantly reducing training time.


But what is Lamini Memory Tuning?

  • Lamini Memory Tuning is a novel method for incorporating facts into LLMs that enhances factual accuracy and reduces hallucinations to previously unattainable levels.

  • It optimizes for zero error on specific facts you direct it to remember, rather than optimizing the average error on all items. Consequently, it recalls those facts nearly perfectly.

  • This novel method maintains the LLM's capacity to generalize, with an average error on all other subjects, allowing it to generate fluent prose based on those facts. On the other hand, Lamini Memory Tuning systematically eliminates hallucinations regarding the information that interests you.

  • In addition, the proposed method aims to achieve a training loss of nearly zero for critical facts that should not be hallucinated.

What's Wrong with the Right Data?
Let’s discuss below.


The Paradox of Hallucinations with the Right Data

  • The model likely aggregates the correct response with similar but incorrect options in its internal representation.

  • Appropriate context increases the probability of the correct answer and nearby incorrect options.

  • The model cannot differentiate between absolutely correct and nearly correct answers, as general models have never acquired the ability to set the loss on nearly correct answers to zero. In addition, prompting and RAG have no impact on this.

  • Lamini Memory Tuning directly addresses this issue by combining methods from information retrieval and AI to train the model that getting an answer nearly right is the same as getting it totally wrong (see fig 3).


How Does Llama-Tuning Actually Work?

When expecting the model to recall a particular fact, Lamini Memory Tuning redirects the entire probability mass to that fact (i.e., specific tokens within a specific context), such as the precise SQL schema for your database.

This leads to output probabilities that are not only closer to the correct result but also precisely accurate (see example in Tab. 1).

To achieve this, Lamini Memory Tuning adjusts a vast array of memory experts on any open-source LLM. Each memory expert functions as a LoRA adapter, serving as memory for the model.

Together, the memory experts specialize in an extensive range of methods to ensure the factual accuracy and integrity of the data they focus on.

Figure 4 displays a wireframe of Mixture of Memory Experts (MoME) architecture.

The system is fundamentally a pretrained transformer backbone enhanced by adapters (similar to specialists) dynamically selected from an index using cross-attention (index in RETRO)., the network is trained end-to-end while freezing the backbone, allowing us to precisely encode specific facts in the selected experts.

The significant aim of MoME is to reduce the level of computation needed for the memorization of facts. This is achieved through the implementation of the following training algorithm:


Training Algorithm

  • Select a subset of experts, such as 32, from an array of one million to address a specific query.

  • Freeze the weights of the backbone network and the cross-attention model used to identify the expert.

  • Perform gradient descent steps until the loss is sufficiently diminished to facilitate the retention of the information.


Final Words

  • This research outlines novel research that challenges the current consensus regarding the generalizability of LLMs and their capacity to generalize without hallucinations.

  • Our research emphasizes the importance of developing novel metrics and methodologies to assess the precision with which LLMs can memorize and recall facts.

  • It also implies that LLMs possess the capacity to accurately store large datasets of facts, even when the training data is cluttered or random.


Read More!

  • How a Fortune 500 company implements Lamini Memory Tuning to achieve a 95% accurate text-to-SQL agent? (Link).

  • An Observation on the Memorization Across Neural Language Models (Link)

Want to ensure that your AI chatbot

is performing the way it is supposed to?

Want to ensure that your AI chatbot is performing the way it is supposed to?

Want to ensure that your AI chatbot

is performing the way it is supposed to?

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi