 
                Zifeng Wang
            Zifeng Wang is a research scientist at Google, working on exciting machine learning algorithms and their applications. His research interests include efficient model adaptation, continual learning, and large language models. He received his PhD in machine learning from Northeastern University advised by Prof. Jennifer Dy.
          
        
        Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Multi-turn Function-calling via Graph-based Execution and Translation
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Kai-Wei Chang
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ke Jiang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jindong Gu
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Fan Yin
                      
                    
                  
              
            
          
          
          
          
            2025
          
          
        
        
        
          
              Preview abstract
          
          
              We propose a principled method to synthesize high-quality multi-turn function calling trajectories to align large language model (LLM)-based agents. We start with iteratively building function calling graph and defining node operations to increase its complexity. This enables us to construct reliable reference. Then, based on the synthesized function calling graph, we adopt back-and-forth translation to first construct multi-turn user queries and then, fill in the function arguments with information in the query. We sample positive trajectories that distill the function graph reference and negative trajectories that contrast with the positive trajectories in targeted loss patterns in multi-turn scenarios. Training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, we obtain 67.42 on BFCL and 71.7 on ToolQuery with an open-sourced model with 14B parameters, surpassing the performance of strong proprietary models like o1.
              
  
View details
          
        
      
    
        
          
            
              HEART: Emotionally-driven test-time scaling of Language Models
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Souradip Chakraborty
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Gabriela Pinto
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            2025
          
          
        
        
        
          
              Preview abstract
          
          
              Test-time scaling has shown considerable success in improving the performance of language models on complex reasoning tasks without requiring fine-tuning. However, current strategies, such as self-reflection or ensembling, primarily focus on logical or structural refinement. They do not leverage the guiding potential of affective feedback. Inspired by psychological research showing that emotions can modulate cognitive performance, we introduce  HEART--a novel framework that uses emotionally-driven prompts for iterative self-correction. HEART provides feedback on a models' incorrect response using a curated set of concise, emotionally charged phrases based on Paul Ekman's six basic emotions. By systematically varying the emotional tone of the feedback across iterations, our method guides the model to escape flawed reasoning paths and explore more promising alternatives. We evaluate our framework on challenging reasoning benchmarks including OlympiadBench, Humanity's Last Exam, and SimpleQA. Across these benchmarks, our approach delivers significantly deeper reasoning which leads to consistent and significant increase in accuracy compared to existing prompting methods. Crucially, these gains are observed across a diverse range of model architectures, demonstrating the broad applicability of our technique. Overall, our findings suggest that the next frontier in machine reasoning may lie not just in refining logic, but also in understanding and leveraging the 'HEART' of the models.
              
  
View details
          
        
      
    
        
          
            
              Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Lei Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Wenda Xu
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rishabh Agarwal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        William Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dhruv Madeka
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            ICLR 2025
          
          
        
        
        
          
              Preview abstract
          
          
              Recent knowledge distillation (KD) research made significant progress on improving smaller student models to match larger teachers' performances. Two noticeable methods, supervised KD and on-policy KD emerged as the state-of-the-art approaches. However, supervised KD for auto-regressive models suffers from distribution mismatch between training over fixed dataset and inference over student generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples and the teacher's potential inaccuracies in assessing these samples. To address these limitations, we introduce Speculative Knowledge Distillation (SKD). Instead of solely training on teacher- or student-proposed samples, SKD leverages the student model to initially propose tokens following its own generation distribution. Subsequently, the teacher model is employed to replace tokens that are deemed out-of-distribution. Compared with supervised KD, the samples generated by SKD are more likely to align with the student's inference-time distribution, and 2) SKD can mitigate the generation of low-quality sequences by incorporating the teacher's feedback at each token. Furthermore, we demonstrate that SKD is a generic framework capable of implementing both supervised and on-policy knowledge distillation as specific instances. To validate SKD's effectiveness, we apply it to distill autoregressive large language models for various tasks, including translation, summarization, math, and instruction following. Our experiments consistently demonstrate SKD's superior performance compared to existing methods across different domains, tasks, data sizes, and model initialization strategies.
              
  
View details
          
        
      
    
        
          
            
              PlanGEN: A Framework Utilizing Inference-Time Algorithms with LLM Agents for Planning and Reasoning
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Hootan Nakhost
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mihir Parmar
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Swaroop Mishra
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chitta Baral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jindong Gu
                      
                    
                  
              
            
          
          
          
          
            2025
          
          
        
        
        
          
              Preview abstract
          
          
              Scaling inference-time computation in Large Language Models (LLMs) dramatically improves their capabilities for solving complex problems. While test-time scaling has shown promise in many tasks such as code generation and mathematical reasoning, integration of inference-time algorithms into multi-agent frameworks for planning and reasoning remains under-explored. To this end, we explore popular inference-time algorithms—Best of N, Tree of Thought (ToT), and REward BAlanced SEarch (REBASE)—with proposed feedback-driven refinement. Our feedback-driven refinement employs specialized agents: a constraint agent to enforce task instance-specific constraints, and a verifier agent to evaluate plan quality. Furthermore, we hypothesize that test-time scaling can be proportional to instance-level complexity. Thus, we propose an additional selection agent to dynamically optimize algorithm choice. We evaluate our proposed approaches on four different benchmarks, i.e., NATURAL PLAN, GPQA, OlympiadBench, and DocFinQA. Experimental results show that our methods outperform strong baselines, achieving state-of-the-art results in NATURAL PLAN, OlympiadBench , and DocFinQA. Our key findings demonstrate that constraint-guided iterative refinement and algorithm selection improves both planning and downstream reasoning in LLMs
              
  
View details
          
        
      
    
        
          
            
              Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Zilong Wang
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Steven Zheng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Swaroop Mishra
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yuwei Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Anush Mattapalli
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ankur Taly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jingbo Shang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            ICLR 2025
          
          
        
        
        
          
              Preview abstract
          
          
              Retrieval augmented generation (RAG) has attracted a lot of attention across both academia and industry due to its capability in inserting timely and accurate evidence to the generation by large language models. However, the introduction of retrieved evidence largely makes the input prompt longer, which would harm the understanding quality of large language models and make it slower in actual usage scenarios. To solve these issues, we propose SpeculativeRAG, which leverages a smaller LLM to conduct the retrieval augmented generation for a larger LLM. The smaller LLM can digest a few pieces of evidence and generate multiple pieces of drafts in parallel rapidly, and these drafts will be verified by a large LLM to guarantee the quality. We achieve a higher speed as well as a better quality in the RAG results.
              
  
View details
          
        
      
    
        
          
            
              Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Wenda Xu
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dhruv Madeka
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lei Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        William Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rishabh Agarwal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            2025
          
          
        
        
        
          
              Preview abstract
          
          
              Recent advances in knowledge distillation (KD) have enabled smaller student models to approach the performance of larger teacher models. However, popular methods such as supervised KD and on-policy KD, are adversely impacted by the knowledge gaps between teacher-student in practical scenarios. Supervised KD suffers from a distribution mismatch between training with a static dataset and inference over final student-generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples with which teacher models are not familiar, resulting in inaccurate teacher feedback. To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student’s inference-time distribution. In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution, transferring high-quality knowledge adaptively. We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following, and show that SKD consistently outperforms existing KD methods across different domains, data sizes, and model initialization strategies
              
  
View details
          
        
      
    
        
          
            
              CodecLM: Aligning Language Models with Tailored Synthetic Data
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Chun-Liang Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jin Miao
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            NAACL 2024
          
          
        
        
        
          
              Preview abstract
          
          
              Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works focus on generating diverse instructions and applying LLM to increase instruction complexity, often neglecting downstream use cases. It remains unclear how to tailor high-quality data to elicit better instruction-following abilities in different target instruction distributions and LLMs. To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution, and then decode metadata to create tailored instructions. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples. Extensive experiments on four open-domain instruction following benchmarks validate the effectiveness of CodecLM over the current state-of-the-arts.
              
  
View details
          
        
      
    
        
          
            
              SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Satya Gundabathula
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hanjun Dai
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hootan Nakhost
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            TMLR (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              Text-to-SQL, the process of translating natural language into Structured Query Language
(SQL), represents a transformative application of large language models (LLMs), potentially
revolutionizing how humans interact with data. This paper introduces the SQL-PaLM
framework, a comprehensive solution for understanding and enhancing Text-to-SQL using
LLMs, using in the learning regimes of few-shot prompting and instruction fine-tuning. With
few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error filtering. With instruction fine-tuning, we delve deep in understanding the critical
paradigms that influence the performance of tuned LLMs. In particular, we investigate
how performance can be improved through expanded training data coverage and diversity,
synthetic data augmentation, and integrating query-specific database content. We propose
a test-time selection method to further refine accuracy by integrating SQL outputs from
multiple paradigms with execution feedback as guidance. Additionally, we tackle the
practical challenge of navigating intricate databases with a significant number of tables and
columns, proposing efficient techniques for accurately selecting relevant database elements to
enhance Text-to-SQL performance. Our holistic approach yields substantial advancements
in Text-to-SQL, as demonstrated on two key public benchmarks, Spider and BIRD. Through
comprehensive ablations and error analyses, we shed light on the strengths and weaknesses
of our framework, offering valuable insights into Text-to-SQL’s future work.
              
  
View details
          
        
      
    
        
          
            
              Chain-of-Table: Evolves Tables in the LLM Reasoning Chain for Table Understanding
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Zilong Wang
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Hao Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chun-Liang Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jingbo Shang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            ICLR (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning requires the extraction of underlying semantics from both free-form questions and semi-structured tabular data. Chain-of-Thought and its similar approaches incorporate the reasoning chain in the form of textual context, but it is still an open question how to effectively leverage tabular data in the reasoning chain. We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts. Specifically, we guide LLMs using in-context learning to iteratively generate operations and update the table to represent a tabular reasoning chain. LLMs can therefore dynamically plan the next operation based on the results of the previous ones. This continuous evolution of the table forms a chain, showing the reasoning process for a given tabular problem. The chain carries structured information of the intermediate results, enabling more accurate and reliable predictions. Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks across multiple LLM choices.
              
  
View details
          
        
      
    
        
          
            
              Found in the middle: Calibrating Positional Attention Bias Improves Long Context Utilization
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Cheng-Yu Hsieh
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Yung-Sung Chuang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chun-Liang Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abhishek Kumar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        James Glass
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alexander Ratner
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ranjay Krishna
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            2024
          
          
        
        
        
          
              Preview abstract
          
          
              Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences.
              
  
View details
          
        
      
    