Skip to main content

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Research

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Resources

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Conferences & events

Careers

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Blog

About

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Google Research

Learn about all our AI

Google DeepMind

Explore the frontier of AI

Try our AI experiments

Conferences & events

Blog

Idan Szpektor

Home
People

Idan Szpektor

Research Areas

Human-computer interaction and visualization
Machine intelligence
Natural language processing

Authored Publications

results

Filter by:

Publications

Google 34
Other 1

Years

2025 7
2024 6
2023 9
2022 4
2021 2
2020 3
2019 3
2018 1

Research Areas

Data Mining and Modeling 1
General Science 2
Human-Computer Interaction and Visualization 2
Information Retrieval and the Web 2
Machine Intelligence 4
Machine Perception 6
Natural Language Processing 21
Responsible AI 1

Teams

I-DRIM 13
Language 2

Sort By

Title
Title, descending
Year
Year, descending

chip template

Inside-Out: Hidden Factual Knowledge in LLMs

Jonathan Herzig

Eyal Ben David

Eran Ofek

Hadas Orgad

Zorik Gekhman

Idan Szpektor

Roi Reichart

Yonatan Belinkov

2025

Preview abstract This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at this possibility, none has clearly defined or demonstrated this phenomenon. We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher. This gives rise to external and internal knowledge, depending on the information used to score individual answer candidates: either the model’s observable token-level probabilities or its intermediate computations. Hidden knowledge arises when internal knowledge exceeds external knowledge. We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup. Our results indicate that: (1) LLMs consistently encode more factual knowledge internally than what they express externally, with an average gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a model can internally know an answer perfectly, yet fail to generate it even once, despite large-scale repeated sampling of 1,000 answers. This reveals fundamental limitations in the generation capabilities of LLMs, which (3) puts a practical constraint on scaling test-time compute via repeated answer sampling in closed-book QA: significant performance improvements remain inaccessible because some answers are practically never sampled, yet if they were, we would be guaranteed to rank them first. View details

Beneath the surface of consistency: Exploring cross-lingual knowledge representation sharing in llms

Maxim Ifergan

Leshem Choshen

Roee Aharoni

Idan Szpektor

Omri Abend

2025

Preview abstract The veracity of a factoid is largely independent of the language it is written in. However, language models are inconsistent in their ability to answer the same factual question across languages. This raises questions about how they represent a given fact across languages. We explore multilingual factual knowledge through two parallel aspects: the model's ability to answer a query consistently across languages ({\it consistency}), and to represent these answers for several languages ({\it shared representation}). We propose a methodology to measure the extent to which LLMs share factual knowledge across languages, repurposing knowledge editing methods to assess cross-lingual generalization. We examine LLMs of different types, including monolingual, bilingual, multilingual, and language-extended models. Our analysis on a new multilingual dataset reveals that high consistency does not necessarily imply shared representation, particularly for languages with different scripts. We find that script similarity is a dominant factor in knowledge sharing, even in monolingual models. We observe that if LLMs could fully share knowledge across languages, their accuracy in their best-performing language could benefit an increase of up to 150\% on average. These findings highlight the need for improved multilingual knowledge representation in LLMs and suggest a path for the development of more robust and consistent multilingual LLMs. View details

(D)RAGged Into a Conflict: Detecting and Addressing Conflicting Sources in Retrieval-Augmented LLMs

Arie Cattan

Alon Jacovi

Ori Ram

Jonathan Herzig

Roee Aharoni

Sasha Goldshtein

Eran Ofek

Idan Szpektor

Avi Caciularu

2025

Preview abstract Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing LLMs with relevant and up-to-date information. However, the retrieved sources can often bring conflicting information and it is not clear how models address such discrepancies. In this work, we first point out that knowledge conflicts stem from various reasons and thus require tailored solutions in order to better align model responses to human preferences. To that end, we introduce a novel taxonomy of knowledge conflicts in RAG and define the desired model’s behavior for each category. Additionally, we construct a high-quality benchmark by asking two expert annotators to identify the conflict type within realistic RAG instances, each comprising a query and its associated search results. Finally, we conduct extensive experiments and show that explicitly informing LLMs about the potential conflict category significantly improves the quality and appropriateness of the responses. Yet, there is still a vast room for improvement. Taken together, our work highlights the importance of evaluating RAG systems not only on factual accuracy but also on their ability to manage and resolve knowledge conflicts effectively. View details

Inside-Out: Hidden Factual Knowledge in LLMs

Jonathan Herzig

Eran Ofek

Hadas Orgad

Zorik Gekhman

Idan Szpektor

Roi Reichart

Yonatan Belinkov

Eyal Ben-David

2025

Preview abstract This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at this possibility, none has clearly defined or demonstrated this phenomenon. We first propose a formal definition of knowledge, quantifying it for a given question as the fraction of correct-incorrect answer pairs where the correct one is ranked higher. This gives rise to external and internal knowledge, depending on the information used to score individual answer candidates: either the model’s observable token-level probabilities or its intermediate computations. Hidden knowledge arises when internal knowledge exceeds external knowledge. We then present a case study, applying this framework to three popular open-weights LLMs in a closed-book QA setup. Our results indicate that: (1) LLMs consistently encode more factual knowledge internally than what they express externally, with an average gap of 40%. (2) Surprisingly, some knowledge is so deeply hidden that a model can internally know an answer perfectly, yet fail to generate it even once, despite large-scale repeated sampling of 1,000 answers. This reveals fundamental limitations in the generation capabilities of LLMs, which (3) puts a practical constraint on scaling test-time compute via repeated answer sampling in closed-book QA: significant performance improvements remain inaccessible because some answers are practically never sampled, yet if they were, we would be guaranteed to rank them first. View details

RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation

Aviv Slobodkin

Hagai Taitelbaum

Yonatan Bitton

Brian Gordon

Michal Sokolik

Almog Gueta

Royi Rassin

Dani Lischinski

Idan Szpektor

2025

Preview abstract Subject-driven text-to-image (T2I) generation aims to produce images that align with a given textual description, while preserving the visual identity from a referenced subject image. Despite its broad downstream applicability - ranging from enhanced personalization in image generation to consistent character representation in video rendering - progress in this field is limited by the lack of reliable automatic evaluation. Existing methods either assess only one aspect of the task (i.e., textual alignment or subject preservation), misalign with human judgments, or rely on costly API-based evaluation. To address this gap, we introduce RefVNLI, a cost-effective metric that evaluates both textual alignment and subject preservation in a single run. Trained on a large-scale dataset derived from video-reasoning benchmarks and image perturbations, RefVNLI outperforms or statistically matches existing baselines across multiple benchmarks and subject categories (e.g., Animal, Object), achieving up to 6.4-point gains in textual alignment and 5.9-point gains in subject preservation. View details

Language Models Know More Than They Show: Exploring Hallucinations From the Model’s Viewpoint

Hasas Orgad

Michael Toker

Zorik Gekhman

Roi Reichart

Idan Szpektor

Hadas Kotek

Yonatan Belinkov

2025

Preview abstract We introduce a model-centric approach to investigate hallucinations and other errors generated by large language models (LLMs). We begin by developing an enhanced error detection method, using a linear classifier that leverages intermediate representations of exact answer tokens and outperform existing techniques. Our findings confirm that LLMs encode information on the truthfulness of their outputs, yet they also challenge the existence of universal truthfulness features by showing that generalization is skill-specific. Next, we propose a new error categorization by analyzing the distribution their responses, which uncovers distinct patterns in error types. We discover that these types are also predictable from internal model states, revealing that internal representations encode more than truthfulness. Finally, we find that a trained probe can effectively identify correct answers from multiple generated samples, outperforming other baselines for returning an answer. This exposes a critical \textit{disconnect} between the external behavior of LLMs and their internal state. Our results suggest new directions for understanding and mitigating errors in LLMs. View details

MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs

Gabrielle Kaili-May Liu

Gal Yona

Avi Caciularu

Idan Szpektor

Tim G. J. Rudner

Arman Cohan

2025

Preview abstract A critical component in the trustworthiness of LLMs is reliable uncertainty communication, yet LLMs often use assertive language when conveying false claims, leading to over-reliance and eroded trust. We present the first systematic study of faithful confidence calibration of LLMs, benchmarking models' ability to use linguistic expressions of uncertainty that faithfully reflect their intrinsic uncertainty, across a comprehensive array of models, datasets, and prompting strategies. Our results demonstrate that LLMs largely fail at this task, and that existing interventions are insufficient: standard prompt approaches provide only marginal gains, and existing, factuality-based calibration techniques can even harm faithful calibration. To address this critical gap, we introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition. We show that MetaFaith robustly improves faithful calibration across diverse models and task domains, enabling up to 61% improvement in faithfulness and achieving an 83% win rate over original generations as judged by humans. View details

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Uri Shaham

Jonathan Herzig

Roee Aharoni

Idan Szpektor

Reut Tsarfaty

Matan Eyal

arXiv (2024)

Preview abstract As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses. View details

Multi-turn Reinforcement Learning with Preference Human Feedback

Oran Lang

Remi Munos

Bilal Piot

Yossi Matias

Lior Shani

Asaf Cassel

Idan Szpektor

Avital Zipori

Aviv Rosenberg

Hila Noga

Orgad Keller

Daniele Calandriello

Avinatan Hassidim

2024

Preview abstract In this paper, we discuss the multi-turn preference based RL problem. We start by extending the regularized self-play Nash-MD formulation of the preference based RL to the general multi-turn case and show it converges to a Nash equilibrium in the online setting, where the transition and preferences model are known. We empirically test our algorithm on two environments: one where there is an explicit reward, and another in which only preference data is available without assuming any reward. Our experiment show that our algorithm is able to recover the same performance as as a direct reward-based RL algorithm, when a reward-signal is available, even when using the weaker preference signal. When only direct preference is available, our algorithm improves upon both the supervised and reward-based RLHF baselines. View details

Systematization, Analysis, and Mitigation of LLMs Hallucinations

Jonathan Herzig

Fazl Barez

Zorik Gekhman

Gabriel Stanovsky

Itay Itzhak

Idan Szpektor

Roi Reichart

Yonatan Belinkov

Dana Arad

Adi Simhi

Arxiv (2024)

Preview abstract Hallucinations in large language models represent a critical barrier to reliable usage. However, existing research tends to focus on categorizing error types by their manifestations rather than by their underlying knowledge-related causes. We propose a novel framework for categorizing hallucinations along two critical dimensions for effective mitigation: knowledge and certainty. Along the knowledge axis, we distinguish between hallucinations caused by a lack of knowledge (HK− ) and those occurring despite the model having the correct knowledge (HK+). Through model-specific dataset construction and comprehensive experiments across multiple models and datasets we show that we can distinguish HK+ and HK− hallucinations. Furthermore, HK+ and HK− hallucinations exhibit different characteristics, and respond differently to mitigation strategies, with activation steering proving effective only for HK+ hallucinations. We then turn to the certainty axis, identifying a particularly concerning subset of HK+ hallucinations that occur with high certainty, which we refer to as Certainty Misalignment (CC), where models hallucinate with certainty despite having the correct knowledge. To address this, we introduce a new evaluation metric (CC-Score). This reveals significant blind spots in existing mitigation methods, which may perform well on average but fail disproportionately on these critical cases. Our targeted probe-based mitigation approach, specifically designed for CC instances, demonstrates superior performance compared to existing methods (such as internal probing-based and prompting-based). These findings highlight the importance of considering both knowledge and certainty in hallucination analysis and call for more targeted approaches to detection and mitigation that consider their underlying causes. View details

1
2
3
…

of 4

of 4 pages

Search on Google Scholar

Join us

We're always looking for more talented, passionate people.

See opportunities

Follow us

Explore our other initiatives

Google AI

Discover how Google AI is committed to enriching knowledge and solving complex challenges

Products
Build
Research
Responsibility
Societal Impact
About

Google Cloud

High-performance infrastructure for cloud computing, data analytics & machine learning

Overview
Solutions
Products
Pricing
Resources

Google DeepMind

Our mission is to build AI responsibly to benefit humanity

Models
Research
Science
About

Google Labs

Explore the future of AI responsibly with Google Labs

About
Experiments
Stay connected

Google Products

×