[AARR] LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

[AARR] LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

[AARR] LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

Author

Align AI R&D Team

Author

Author

Author

Align AI R&D Team

Align AI R&D Team

Align AI R&D Team

💡 The Align AI Research Review is a weekly dive into the most relevant research papers, AI industry news and updates!


Generative AI technologies are powerful, but they restrict themselves based on the information they have. Even though an LLM such as ChatGPT is capable of executing a wide range of tasks, but every its foundational knowledge limits itself based on its training data.

Retrieval Augmented Generation (RAG) presents an alternative solution for these challenges with the integration of external knowledge sources, such as databases, into LLMs. RAG proves particularly useful in situations requiring extensive knowledge or in domain-specific applications that necessitate constantly changing information.

The paper “LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding from Meta proposed a novel model-agnostic framework called LLM-augmented retrieval, which enhances the performance of existing retriever models by elevating the embedding of documents through LLM augmentation (see fig 1).


Components of LLM-augmented retrieval Framework

The LLM-augmented retrieval framework involves different components as:

  • Synthetic Relevant Queries

Promptagator and InPars use as few as eight task-specific examples to prompt the LLM to generate synthetic queries for newly documents. These demonstrations are input-output pairs where the input is a document text and the output is a relevant query. However, this process only generates “relevant” synthetic queries.

In this paper, Synthetic relevant queries are generated from LLM and then processed into doc-level embedding together with passages split from the original document. The final retrieval is based on the similarity between user query and the doc-level embedding.

As a result, Meta adopted the generated synthetic queries (see fig 1) to introduce multiple perspectives on the semantics of the source document, which facilitates the matching of relevant queries.

  • Title

The title of a document plays a key role in analyzing its relevance and significance to a user’s input. Users are more likely to understand the substance and intent of a document when the title is skillfully composed and provides crucial context and keywords. In a nutshell, the research utilized the title directly in case of original document. If it does not, the framework employ LLM’s to generate a synthetic title for the document.

  • Passages

When dealing with long documents in information retrieval tasks, it's general practice to split them into smaller chunks to ensure they fit within the context window limit of the model being used. These passages (or chunks) are extracted directly from the original documents and are not generated through LLM augmentation techniques.

The empirical studies of Meta revealed that the optimal chunking size is 64 tokens. This size proved effective for models like Contriever and DRAGON. However, for token-level late-interaction models such as ColBERT and ColBERTv2, chunking the original documents is unnecessary unless they exceed the context window limit.


Adapting Retrieval Frameworks for Varied Model Architectures

Additionally, the research discusses the high-level idea of document level embedding for information retrieval, and then use bi-encoders and token-level and late-interaction architectures. Furthermore, it is important to mention how document-level embedding can be adapted to various retriever model structures.

  • For Bi-encoder

Bi-encoders typically consist of "Two-Tower" model structures. To determine relevance between a query and a document, the research employs query and document encoders to generate embedding vectors.  These vectors compare using dot products or cosine similarity to calculate similarity scores. To enhance document embeddings, the framework integrates synthetic queries and titles.

The question arises: how should these embedding vectors be combined to represent the entire document?

Building on a simple yet novel concept introduced in (Arora et al., 2017), Meta introduced it to the document embedding challenge. Specifically, the proposed system estimates the average of all chunk embedding vectors to form the chunk field embedding.

Similarly, for the synthetic query field, the system generate the embedding vector for each query using the query encoder, then compute the average of these vectors to derive the query field embedding.

  • For Token-Level Late-Interaction Models

In late-interaction models like ColBERT and ColBERTv2, the approach differs from conventional methods. Instead of using single embedding vectors, they operate at the token level.

This means every token in the query and document has its own embedding vector, enabling a better understanding. For instance, each query token identifies its closest match in the document, and their similarity contributes to the overall score. This level of granularity allows for the direct inclusion of synthetic queries and titles into the document, providing richer context. If the document exceeds the token limit, it can be further chunked for processing.


Pros and Cons

Now, let’s discuss the Pros and Cons that paper talks about, specifically from Align AI's expertise and focus.  The research paper presents valuable insights, it is important to critically evaluate its limitations.

  • The paper also shows improvements to other important retriever training components, such as negative sampling and loss functions.

  • Augmenting relevant queries and titles for original documents requires additional computational resources. This computational demand may limit the approach's usage in resource-constrained environments.

  • LLMs that have been trained on enormous databases are vulnerable to errors, biases, false and misleading information, and fictitious data, including fiction. As a result, they may produce content that appears to be true or rely on obsolete information.


Wrap-up

RAG embodies a practical approach to enhancing the capabilities of LLMs. It tackles the challenge of ineffective training data by integrating external, real-time knowledge into LLM responses, ensuring that the information provided remains contextually meaningful and up-to-date. Moreover, this research introduces LLM-augmented retrieval, a novel framework that notably optimizes the effectiveness of existing retriever models by enriching document embeddings with large language models.


Read More!

To explore deeper into the materials of this blog post, we share a few resources below:

  • Read more about RAG in NLP tasks (Article).

  • Detailed overview of ColBERT (Article)

Want to ensure that your AI chatbot

is performing the way it is supposed to?

Want to ensure that your AI chatbot is performing the way it is supposed to?

Want to ensure that your AI chatbot

is performing the way it is supposed to?

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi