Tomas Pfister

Tomas Pfister

Tomas Pfister is the Head of Cloud AI Research. He came to Google from Apple where he cofounded Apple's central AI research group and published Apple’s first research paper that won the Best Paper Award at CVPR’17. Tomas’ key scientific achievements have been proposing a method to improve the realism of synthetic images; developing the first automated method to detect facial micro-expressions; and inventing a new way for neural networks to exploit spatiotemporal structure. He is currently exploring learning from small amount of labeled data (using techniques such as generative models, few-shot learning, transfer learning) and explainability/interpretability of deep learning models, and is particularly excited about the potential of AI in healthcare & education. His research has laid the foundation for several applications such as Face ID in iPhone X, autonomous driving, human pose estimation, detecting facial micro-expressions & translating sign language. Tomas did his PhD in deep learning with Prof Andrew Zisserman at Oxford University and bachelor’s degree in computer science at Cambridge University. He is the recipient of the Forbes 30 Under 30 award, and has received over 40 research awards, including 3 best paper awards, with numerous publications in top AI research venues. His work has been frequently featured in mainstream media, including Forbes, BusinessInsider & Wired.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We propose Heterogeneous Swarms, an algorithm to discover and adapt multi-LLM systems by jointly optimizing model roles and weights. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as input-output relationships and optimize the directed acyclic graph (DAG) of LLMs representing a multi-LLM system. Starting from a swarm of randomly initialized continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order with message passing, evaluate on the utility function, and optimize the adjacency matrices with swarm intelligence based on the utility score. For weight-step, we define JFK-score to evaluate the contribution of individual LLMs in the best-found DAG of the role-step, then optimize model weights with swarm intelligence based on the JFK-score. Extensive experiments demonstrate that Heterogeneous Swarms outperforms 15 baselines spanning role-based and weight-based approaches by 18.5% on average across 12 tasks and contexts. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of initial LLMs. View details
    Preview abstract Large language models (LLMs), optimized through human feedback, have rapidly emerged as a leading paradigm for developing intelligent conversational assistants. However, despite their strong performance across many benchmarks, LLM-based agents might still lack conversational skills such as disambiguation -- when they are faced with ambiguity, they often overhedge or implicitly guess users' true intents rather than asking clarification questions. Under task-specific settings, high-quality conversation samples are often limited, constituting a bottleneck for LLMs' ability to learn optimal dialogue action policies. We propose Action-Based Contrastive Self-Training (ACT), a quasi-online preference optimization algorithm based on Direct Preference Optimization (DPO), that enables data-efficient dialogue policy learning in multi-turn conversation modeling. We demonstrate ACT's efficacy under data-efficient tuning scenarios, even when there is no action label available, using multiple real-world conversational tasks: tabular-grounded question-answering, machine reading comprehension, and AmbigSQL, a novel task for disambiguating information-seeking requests for complex SQL generation towards data analysis agents. Additionally, we propose evaluating LLMs' ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation. ACT demonstrates substantial conversation modeling improvements over standard tuning approaches like supervised fine-tuning and DPO. View details
    Preview abstract Recent advances in knowledge distillation (KD) have enabled smaller student models to approach the performance of larger teacher models. However, popular methods such as supervised KD and on-policy KD, are adversely impacted by the knowledge gaps between teacher-student in practical scenarios. Supervised KD suffers from a distribution mismatch between training with a static dataset and inference over final student-generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples with which teacher models are not familiar, resulting in inaccurate teacher feedback. To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student’s inference-time distribution. In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution, transferring high-quality knowledge adaptively. We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following, and show that SKD consistently outperforms existing KD methods across different domains, data sizes, and model initialization strategies View details
    Deep Researcher with Test-time Diffusion
    Guan Sun
    Zoey CuiZhu
    Yuanjun (Sophia) Bi
    Weiming Wen
    Hui Wan
    Chunfeng Wen
    Solène Maître
    George Lee
    Vishy Tirumalashetty
    Emily Xue
    Burak Gokturk
    2025
    Preview abstract Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising" process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process. This draft-centric design guides the report writing process to be more timely and coherent while reducing information loss during the iterative search process. We demonstrate that our TTD-DR achieves state-of-the-art results on a wide array of benchmarks that require intensive search and multi-hop reasoning, significantly outperforming existing deep research agents. View details
    Preview abstract Recent knowledge distillation (KD) research made significant progress on improving smaller student models to match larger teachers' performances. Two noticeable methods, supervised KD and on-policy KD emerged as the state-of-the-art approaches. However, supervised KD for auto-regressive models suffers from distribution mismatch between training over fixed dataset and inference over student generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples and the teacher's potential inaccuracies in assessing these samples. To address these limitations, we introduce Speculative Knowledge Distillation (SKD). Instead of solely training on teacher- or student-proposed samples, SKD leverages the student model to initially propose tokens following its own generation distribution. Subsequently, the teacher model is employed to replace tokens that are deemed out-of-distribution. Compared with supervised KD, the samples generated by SKD are more likely to align with the student's inference-time distribution, and 2) SKD can mitigate the generation of low-quality sequences by incorporating the teacher's feedback at each token. Furthermore, we demonstrate that SKD is a generic framework capable of implementing both supervised and on-policy knowledge distillation as specific instances. To validate SKD's effectiveness, we apply it to distill autoregressive large language models for various tasks, including translation, summarization, math, and instruction following. Our experiments consistently demonstrate SKD's superior performance compared to existing methods across different domains, tasks, data sizes, and model initialization strategies. View details
    Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
    Zilong Wang
    Steven Zheng
    Swaroop Mishra
    Yuwei Zhang
    Anush Mattapalli
    Ankur Taly
    Jingbo Shang
    ICLR 2025
    Preview abstract Retrieval augmented generation (RAG) has attracted a lot of attention across both academia and industry due to its capability in inserting timely and accurate evidence to the generation by large language models. However, the introduction of retrieved evidence largely makes the input prompt longer, which would harm the understanding quality of large language models and make it slower in actual usage scenarios. To solve these issues, we propose SpeculativeRAG, which leverages a smaller LLM to conduct the retrieval augmented generation for a larger LLM. The smaller LLM can digest a few pieces of evidence and generate multiple pieces of drafts in parallel rapidly, and these drafts will be verified by a large LLM to guarantee the quality. We achieve a higher speed as well as a better quality in the RAG results. View details
    Preview abstract We propose Model Swarms, a collaborative search algorithm to adapt LLM experts via swarm intelligence. Specifically, Model Swarms starts with a pool of LLM experts and a utility function. Guided by the best-found checkpoints across models, diverse LLM experts collaboratively move in the weight space and search for adapted models that optimize the utility function. Compared to existing model composition approaches, Model Swarms offers modularity, works in low-data regimes, and doesn't need assumptions about existing experts and how they should be composed. Extensive experiments demonstrate that Model Swarms could flexibly adapt LLM experts to a single dataset, multi-dataset domains, reward models, as well as diverse human preferences. Further analysis reveals that LLM experts discover previously unseen capabilities in the search process and that Model Swarms enable the weak-to-strong transition of experts through the collaborative search process. View details
    Preview abstract Agents based on large language models (LLMs) for machine learning engineering (MLE) can automatically implement ML models via code generation. However, existing approaches to build such agents often rely heavily on inherent LLM knowledge and employ coarse exploration strategies that modify the entire code structure at once. This limits their ability to select effective task-specific models and perform deep exploration within specific components, such as experimenting extensively with feature engineering options. To overcome these, we propose MLE-STAR, a novel approach to build MLE agents. MLESTAR first leverages external knowledge by using a search engine to retrieve effective models from the web, forming an initial solution, then iteratively refines it by exploring various strategies targeting specific ML components. This exploration is guided by ablation studies analyzing the impact of individual code blocks. Furthermore, we introduce a novel ensembling method using an effective strategy suggested by MLE-STAR. Our experimental results show that MLE-STAR achieves medals in 64% of the Kaggle competitions on the MLE-bench Lite, significantly outperforming the best alternative. View details
    Preview abstract Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities—utterances, turns, and sessions—into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs’ cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset. View details
    Preview abstract Large language models (LLMs) have achieved remarkable advancements in natural language understanding, generation, and manipulation of text-based data. However, one major issue towards their widespread deployment in the real world is that they can generate "hallucinated" answers that are not factual. Towards this end, this paper focuses on improving grounding from a holistic perspective with a novel framework, AGREE. We start with the design of a test time adaptation capability that takes into account the support information generated in self-grounded responses. To effectively enable this capability, we propose that the model tuning needs to be redesigned with a novel tuning objective mimicking the test time adaptation setup for grounding. This tuning on top of the pre-trained LLMs requires small amount of data that need to be constructed in a particular way to learn the grounding information, for which we introduce a data construction method. Our results show that AGREE pushes the state-of-the-art in grounding, demonstrated across many datasets. View details