Systematization, Analysis, and Mitigation of LLMs Hallucinations

Jonathan Herzig

Fazl Barez

Zorik Gekhman

Gabriel Stanovsky

Itay Itzhak

Idan Szpektor

Roi Reichart

Yonatan Belinkov

Dana Arad

Adi Simhi

Arxiv (2024)

Google Scholar

Abstract

Hallucinations in large language models represent a critical barrier to reliable usage. However, existing research tends to focus on categorizing error types by their manifestations rather than by their underlying knowledge-related causes. We propose a novel framework for categorizing hallucinations along two critical dimensions for effective mitigation: knowledge and certainty. Along the knowledge axis, we distinguish between hallucinations caused by a lack of knowledge (HK− ) and those occurring despite the model having the correct knowledge (HK+). Through model-specific dataset construction and comprehensive experiments across multiple models and datasets we show that we can distinguish HK+ and HK− hallucinations. Furthermore, HK+ and HK−
hallucinations exhibit different characteristics, and respond differently to mitigation strategies, with activation steering proving effective only for HK+ hallucinations. We then turn to the certainty axis, identifying a particularly concerning subset of HK+ hallucinations that occur with high certainty, which we refer to as Certainty Misalignment (CC), where models hallucinate with certainty despite having the correct knowledge. To address this, we introduce a new evaluation metric (CC-Score). This reveals significant blind spots in existing mitigation methods, which may perform well on average but fail disproportionately on these critical cases. Our targeted probe-based mitigation approach, specifically designed for CC instances, demonstrates superior performance compared to existing methods (such as internal probing-based and prompting-based). These findings highlight the importance of considering both knowledge and certainty in hallucination analysis and call for more targeted approaches to detection and mitigation that consider their underlying causes.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Systematization, Analysis, and Mitigation of LLMs Hallucinations

Abstract

Meet the teams driving innovation