[AARR] To Believe or Not to Believe Your LLM

[AARR] To Believe or Not to Believe Your LLM

[AARR] To Believe or Not to Believe Your LLM

Author

Align AI R&D Team

Author

Author

Author

Align AI R&D Team

Align AI R&D Team

Align AI R&D Team

💡 The Align AI Research Review is a weekly dive into the most relevant research papers, AI industry news and updates


The researchers have devised a novel method to determine when an LLM's response to a query comes with uncertainty. They distinguish between two categories of uncertainty: epistemic (lack of knowledge) and aleatoric (irreducible randomness).

By employing an information-theoretic metric, they can consistently identify occurrences where epistemic uncertainty is elevated, suggesting that the model's output may be unreliable or even a hallucination.

The main idea of this research is to capitalize on the diverse behavioral patterns that are observed when an LLM is presented with repeated potential responses. A model's probability will not be substantially increased by repeating an incorrect response in the prompt if it has low epistemic uncertainty about a query.

Nevertheless, the likelihood of an incorrect response being repeated can be significantly increased if the epistemic uncertainty is high. A prompt template for this process is given below:

The following example serves to better emphasize the process: By reiterating an incorrect response in the stimulus, to the question: "What is the capital of the United Kingdom?". An alternative response to query Q is Paris; however, London maintains a high probability, as depicted below. Thus, the model has a low level of epistemic uncertainty, as it is certain of the answer. The probabilities for this but also for several other examples are presented below:


The figure 1 shows the queries with single labels that exhibit minimal epistemic uncertainty: Given the repetition of an incorrect response, the conditional normalised probability of the correct completion exists. As a response to the query, each figure displays the query and the two responses that were considered, along with their initial probabilities, in brackets (the first response is the correct one).

On the other hand, the model is subject to significant epistemic uncertainty if the probability of the answer changes. The following are a few examples:


The figure 2 shows the probability of the correct response rapidly diminishes to nearly zero when the prompt includes the incorrect response multiple times. Additionally, three more examples involve substantial epistemic uncertainty.


Computing Epistemic Uncertainty in LLM’s: An Information-Theoretic Approach

The authors develop an information-theoretic metric that measures epistemic uncertainty by assessing the sensitivity of the model's output distribution to the iterative addition of previous (potentially incorrect) responses to the stimulus.

To be more precise, the LLM-derived joint distribution can be arbitrarily close to the ground truth if the response to a prompt, which includes the query and previous responses, is insensitive to those other responses.

Contrary, the LLM's confidence in the knowledge encoded in its parameters is likely to be low if the responses within the context significantly influence new responses from the model. Consequently, the LLM-derived joint distribution cannot be as accurate as the ground truth. This observation distinguishes two cases of high uncertainty: one where aleatoric uncertainty is high, and one where only epistemic uncertainty is high.


Estimating Epistemic Uncertainty

The paper introduces a hallucination detection algorithm based on scores. The authors determine a "pseudo joint distribution" over multiple responses by incorporating the previous responses into the prompt and employing a chain rule.


The above definition is a pseudo joint distribution, as the conventional conditioning in the chain rule is substituted with prompt functions of the conditioning variables.

  • The epistemic uncertainty is constrained by the mutual information (MI) of this pseudo joint distribution. This mutual information estimate can be utilized as a score that denotes the degree of conviction that the LLM hallucinates for the specified query.

  • For this method, the sampled response with the highest probability, as determined by the marginalized pseudo joint distribution, is the default choice.


Numerical Results and Insights

The authors implement extensive experiments using randomly selected subsets of the TriviaQA and AmbigQA benchmark datasets, as well as a newly synthesized WordNet dataset that is specifically designed to contain queries with multiple labels.


Main Findings

  • MI-based method exhibits comparable performance to the semantic-entropy (S.E.) baseline on predominantly single-label datasets.

  • These methods significantly outperforms simpler metrics such as the probability of the greedy response (T0) and self-verification (S.V.) methods.

  • The MI-based method exhibits superior performance, particularly on high-entropy multi-label queries, in mixed datasets that combine single-label and multi-label queries (e.g., TriviaQA+WordNet and AmbigQA+WordNet) (see fig 5-6).


Limitations

It is important to acknowledge that these methods do not fully address the inherent limitations of LLMs.

  • Users may still encounter difficulties in understanding the model's biases and blind spots, particularly in high-stakes scenarios, despite the implementation of improved uncertainty reporting.

  • This paper does not address the potential ethical and societal consequences of utilizing LLMs with ambiguous outputs.


Wrap-Up

  • The research presents a mutual-information-based uncertainty estimator that provides a provable lower bound on the epistemic uncertainty of the LLM's response to a query.

  • The novel approach is an important addition to the field of AI and natural language processing, as it improves model dependability by decoupling aleatoric and epistemic uncertainties.

  • The combination of practical algorithmic implementations and precise mathematical formulations paves the way for language models that are more reliable and trustworthy.


Learn More!

  • To know more about the Benchmarking LLMs (Link).

  • Uncertainty estimation and calibration for LLMs (Link)

Want to ensure that your AI chatbot

is performing the way it is supposed to?

Want to ensure that your AI chatbot is performing the way it is supposed to?

Want to ensure that your AI chatbot

is performing the way it is supposed to?

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi

United States

3003 N. 1st Street, San Jose, California

South Korea

23 Yeouidaebang-ro 69-gil, Yeongdeungpo-gu, Seoul

India

Eldeco Centre, Malviya Nagar, New Delhi