Product

Solutions

Resources

Pricing

Sign-in

Contact Sales

VIEW ALL POSTS

[AARR] Lamini - Memory Tuning

Author

Coxwave Align R&D Team

Author

Coxwave Align R&D Team

💡 The Coxwave Align Research Review is a weekly dive into the most relevant research papers, AI industry news and updates

Banishing LLM Hallucinations Requires Rethinking Generalization

A Novel Approach to Reducing Hallucinations in Large Language Models Through Dynamic Memory Expert Selection.

(Li et al., 2024) introduced a new model architecture named Lamini-1, which uses a system of memory experts to precisely store facts and dynamically retrieve them as needed.

The main goal of this research is to minimize hallucinations, which it claims can reduce from 50% to 5%, a significant reduction. Hallucination is a problem where LLMs produce information not included in their training data.

Compared to conventional methods, this architecture enables more efficient training and better factual recall, all while significantly reducing training time.

But what is Lamini Memory Tuning?

Lamini Memory Tuning is a novel method for incorporating facts into LLMs that enhances factual accuracy and reduces hallucinations to previously unattainable levels.
It optimizes for zero error on specific facts you direct it to remember, rather than optimizing the average error on all items. Consequently, it recalls those facts nearly perfectly.
This novel method maintains the LLM's capacity to generalize, with an average error on all other subjects, allowing it to generate fluent prose based on those facts. On the other hand, Lamini Memory Tuning systematically eliminates hallucinations regarding the information that interests you.
In addition, the proposed method aims to achieve a training loss of nearly zero for critical facts that should not be hallucinated.

What's Wrong with the Right Data?
Let’s discuss below.

The Paradox of Hallucinations with the Right Data

The model likely aggregates the correct response with similar but incorrect options in its internal representation.
Appropriate context increases the probability of the correct answer and nearby incorrect options.
The model cannot differentiate between absolutely correct and nearly correct answers, as general models have never acquired the ability to set the loss on nearly correct answers to zero. In addition, prompting and RAG have no impact on this.
Lamini Memory Tuning directly addresses this issue by combining methods from information retrieval and AI to train the model that getting an answer nearly right is the same as getting it totally wrong (see fig 3).

How Does Llama-Tuning Actually Work?

When expecting the model to recall a particular fact, Lamini Memory Tuning redirects the entire probability mass to that fact (i.e., specific tokens within a specific context), such as the precise SQL schema for your database.

This leads to output probabilities that are not only closer to the correct result but also precisely accurate (see example in Tab. 1).

To achieve this, Lamini Memory Tuning adjusts a vast array of memory experts on any open-source LLM. Each memory expert functions as a LoRA adapter, serving as memory for the model.

Together, the memory experts specialize in an extensive range of methods to ensure the factual accuracy and integrity of the data they focus on.

Figure 4 displays a wireframe of Mixture of Memory Experts (MoME) architecture.

The system is fundamentally a pretrained transformer backbone enhanced by adapters (similar to specialists) dynamically selected from an index using cross-attention (index in RETRO)., the network is trained end-to-end while freezing the backbone, allowing us to precisely encode specific facts in the selected experts.

The significant aim of MoME is to reduce the level of computation needed for the memorization of facts. This is achieved through the implementation of the following training algorithm:

Training Algorithm

Select a subset of experts, such as 32, from an array of one million to address a specific query.
Freeze the weights of the backbone network and the cross-attention model used to identify the expert.
Perform gradient descent steps until the loss is sufficiently diminished to facilitate the retention of the information.

Final Words

This research outlines novel research that challenges the current consensus regarding the generalizability of LLMs and their capacity to generalize without hallucinations.
Our research emphasizes the importance of developing novel metrics and methodologies to assess the precision with which LLMs can memorize and recall facts.
It also implies that LLMs possess the capacity to accurately store large datasets of facts, even when the training data is cluttered or random.

AI jailbreaks: What they are and how they work

Why you can’t analyze LLMs with Amplitude

VIEW ALL POSTS

AI jailbreaks: What they are and how they work

Why you can’t analyze LLMs with Amplitude

[AARR] What’s the Magic Word? A Control Theory Of LLM Prompting

AI jailbreaks: What they are and how they work

Why you can’t analyze LLMs with Amplitude

[AARR] What’s the Magic Word? A Control Theory Of LLM Prompting

[AARR] To Believe or Not to Believe Your LLM

Want to ensure that your AI chatbot

is performing the way it is supposed to?

Speak with our AI expert

Want to ensure that your AI chatbot is performing the way it is supposed to?

Contact Sales

Want to ensure that your AI chatbot

is performing the way it is supposed to?

Contact Sales

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

Product

Coxwave Align Cloud

Coxwave Align Enterprise

Trust Center

Solutions

For Productivity Chatbots

For Education Chatbots

Resources

Developer docs

Blog

Pricing

Academy (coming soon)

Terms and Conditions

Cookie Policy

Responsible Disclosure Policy

Compliance

All Systems Operational

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

All Systems Operational

Product

Coxwave Align Cloud

Coxwave Align Enterprise

Trust Center

Solutions

For Productivity Chatbots

For Education Chatbots

Resources

Developer docs

Blog

Pricing

Academy (coming soon)

Terms and Conditions

Cookie Policy

Responsible Disclosure Policy

Compliance

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

All Systems Operational

Product

Coxwave Align Cloud

Coxwave Align Enterprise

Trust Center

Solutions

For Productivity Chatbots

For Education Chatbots

Resources

Developer docs

Blog

Pricing

Academy (coming soon)

Terms and Conditions

Cookie Policy

Responsible Disclosure Policy

Compliance

[AARR] Lamini - Memory Tuning

[AARR] Lamini - Memory Tuning

[AARR] Lamini - Memory Tuning

Banishing LLM Hallucinations Requires Rethinking Generalization

But what is Lamini Memory Tuning?

The Paradox of Hallucinations with the Right Data

How Does Llama-Tuning Actually Work?

Training Algorithm

Final Words

Read More!

AI jailbreaks: What they are and how they work

Why you can’t analyze LLMs with Amplitude

Related Posts

Related Posts

Related Posts

AI jailbreaks: What they are and how they work

AI jailbreaks: What they are and how they work

Why you can’t analyze LLMs with Amplitude

Why you can’t analyze LLMs with Amplitude

[AARR] What’s the Magic Word? A Control Theory Of LLM Prompting

[AARR] What’s the Magic Word? A Control Theory Of LLM Prompting

AI jailbreaks: What they are and how they work

Why you can’t analyze LLMs with Amplitude

[AARR] What’s the Magic Word? A Control Theory Of LLM Prompting

[AARR] To Believe or Not to Believe Your LLM

Want to ensure that your AI chatbot

is performing the way it is supposed to?

Want to ensure that your AI chatbot is performing the way it is supposed to?

Want to ensure that your AI chatbot

is performing the way it is supposed to?