Systematization, Analysis, and Mitigation of LLMs Hallucinations

Jonathan Herzig

Fazl Barez

Zorik Gekhman

Gabriel Stanovsky

Itay Itzhak

Idan Szpektor

Roi Reichart

Yonatan Belinkov

Dana Arad

Adi Simhi

Arxiv (2024)

Google Scholar

Abstract

Hallucinations in large language models represent a critical barrier to reliable usage. However, existing research tends to focus on categorizing error types by their manifestations rather than by their underlying knowledge-related causes. We propose a novel framework for categorizing hallucinations along two critical dimensions for effective mitigation: knowledge and certainty. Along the knowledge axis, we distinguish between hallucinations caused by a lack of knowledge (HK− ) and those occurring despite the model having the correct knowledge (HK+). Through model-specific dataset construction and comprehensive experiments across multiple models and datasets we show that we can distinguish HK+ and HK− hallucinations. Furthermore, HK+ and HK−
hallucinations exhibit different characteristics, and respond differently to mitigation strategies, with activation steering proving effective only for HK+ hallucinations. We then turn to the certainty axis, identifying a particularly concerning subset of HK+ hallucinations that occur with high certainty, which we refer to as Certainty Misalignment (CC), where models hallucinate with certainty despite having the correct knowledge. To address this, we introduce a new evaluation metric (CC-Score). This reveals significant blind spots in existing mitigation methods, which may perform well on average but fail disproportionately on these critical cases. Our targeted probe-based mitigation approach, specifically designed for CC instances, demonstrates superior performance compared to existing methods (such as internal probing-based and prompting-based). These findings highlight the importance of considering both knowledge and certainty in hallucination analysis and call for more targeted approaches to detection and mitigation that consider their underlying causes.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Systematization, Analysis, and Mitigation of LLMs Hallucinations

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs