Skip to main content

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Research

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Resources

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Conferences & events

Careers

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Blog

About

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Google Research

Learn about all our AI

Google DeepMind

Explore the frontier of AI

Try our AI experiments

Conferences & events

Blog

Long T. Le

Home
People

Long T. Le

Long T. Le is a Staff Research Engineer in Google Cloud AI Research with the mission to bring advance AI to the world. He's currently focusing in new LLM solution like distillation, RAG, Agent. Before that, he worked on a new deep learning method for tabular data, covid-19 forecasting and recommendation AI. Before joining Google, he was a machine learning engineer in Capital One in NYC. At Capital One, he developed different models in loan optimization and first-party fraud detection. He earned his Ph.D. in computer science from Rutgers University. Before that, he earned a bachelor in computing from National University at Singapore.

Research Areas

Natural language processing

Authored Publications

results

Filter by:

Publications

Google 14
Other 0

Years

2026 3
2025 6
2024 3
2021 1
2020 1

Research Areas

Machine Intelligence 5
Natural Language Processing 3

Teams

Cloud AI 2

Sort By

Title
Title, descending
Year
Year, descending

chip template

Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents

Pengfei He

Ash Fox

Lesly Miculicich

Stefan Friedli

Daniel Fabian

Burak Gokturk

Jiliang Tang

Chen-Yu Lee

Tomas Pfister

Long Le

ICML 2026

Preview abstract Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability discovery and exploitation due to limited interaction, weak execution grounding, and a lack of experience reuse. We propose Code-RedTeam, a security-aware multi-agent framework designed to mirror real-world red-teaming workflows by integrating security-domain knowledge, code-aware analysis, execution-grounded iterative reasoning, and long-term memory. Code-RedTeam decomposes vulnerability analysis into coordinated discovery and exploitation stages, enabling agents to plan, execute, validate, and refine actions based on real execution feedback while learning from prior trajectories. Extensive evaluations on challenging security benchmarks demonstrate that Code-RedTeam consistently outperforms strong baselines across diverse backbone models, achieving over 60% attack success rate in vulnerability exploitation and up to 10% absolute improvement in vulnerability detection. Ablation and iteration studies further confirm the critical role of execution feedback, structured interaction, and memory for building robust and generalizable cybersecurity agents. View details

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Lesly Miculicich

Mihir Parmar

Hamid Palangi

Dj Dvijotham

Mirko Montanari

Tomas Pfister

Long Le

ACL 2026

Preview abstract Artificial intelligence is rapidly evolving, marked by the emergence of Large Language Model (LLM) agents – systems capable of complex reasoning, planning, and interaction with digital and physical environments. These agents, powered by advancements in LLMs, demonstrate remarkable capabilities across diverse domains, including finance, healthcare, web navigation, software development, and daily task assistance. Unlike traditional AI systems, LLM agents can perceive their surroundings, formulate multi-step plans, utilize external tools and APIs, access memory or knowledge bases, and execute actions to achieve specified goals. This ability to act upon the world, however, introduces significant safety and security challenges. The safety paradigms developed for traditional LLMs, primarily focused on mitigating harmful textual outputs (e.g., toxicity, bias), are insufficient for safeguarding LLM agents. Agents interacting with dynamic environments and executing actions present a broader attack surface and new categories of risk. These include performing unsafe operations, violating privacy constraints through improper data handling or access control failures, deviating from user objectives (task misalignment), and susceptibility to novel manipulation techniques like indirect prompt injection and memory poisoning. Ensuring the trustworthy operation of these powerful agents is paramount, especially as they are integrated into high-stakes applications. To address this critical challenge, we introduce VeriGuard, a novel framework designed to enhance the safety and reliability of LLM agents by interactively verifying their policies and the actions. VeriGuard integrates a verification module that intercepts code-based actions proposed by the agent. In the first step, VeriGuard will generates and verifies the policies. The policies are rigorously checked against a set of predefined safety and security specifications Then each action will be verified to make sure it will align with the agent specification. This interactive verification loop ensures that the agent's behavior remains within safe operational bounds, effectively preventing the execution of harmful or unintended operations. By verifying each step, VeriGuard provides a robust safeguard, substantially improving the trustworthiness of LLM agents in complex, real-world environments. View details

Causal Armor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

Minbeom Kim

Mihir Parmar

Phil Wallis

Lesly Miculicich

Kyomin Jung

Dj Dvijotham

Long Le

Tomas Pfister

2026

Preview abstract AI agents equipped with tool-calling capabilities are susceptible to \emph{Indirect Prompt Injection} (IPI) attacks. In this attack scenario, malicious commands hidden within \emph{untrusted} content trick the agent into performing unauthorized actions. Existing defenses can reduce attack success but often suffer from the \emph{over-defense dilemma}: they deploy expensive, \emph{always-on} sanitization that degrades utility and latency even in benign scenarios. We revisit IPI through an operational causal lens: a successful injection manifests as a \emph{grounding collapse} where the user request no longer provides decisive support for the agent's privileged action, while a particular untrusted segment provides disproportionate marginal support. Based on this signature, we propose \texttt{CausalArmor}, a selective defense framework that (i) computes lightweight, normalized leave-one-out attributions at privileged decision points, and (ii) triggers targeted sanitization only when an untrusted segment dominates the user intent. Additionally, CausalArmor employs \emph{retroactive Chain-of-Thought masking} to prevent the agent from acting on ``poisoned" reasoning traces. Experiments on AgentDojo and DoomArena demonstrate that CausalArmor matches the security of aggressive defenses with explainability while preserving utility and latency of AI agents. View details

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Zilong Wang

Zifeng Wang

Long Le

Steven Zheng

Swaroop Mishra

Vincent Perot

Yuwei Zhang

Anush Mattapalli

Ankur Taly

Jingbo Shang

Chen-Yu Lee

Tomas Pfister

ICLR 2025

Preview abstract Retrieval augmented generation (RAG) has attracted a lot of attention across both academia and industry due to its capability in inserting timely and accurate evidence to the generation by large language models. However, the introduction of retrieved evidence largely makes the input prompt longer, which would harm the understanding quality of large language models and make it slower in actual usage scenarios. To solve these issues, we propose SpeculativeRAG, which leverages a smaller LLM to conduct the retrieval augmented generation for a larger LLM. The smaller LLM can digest a few pieces of evidence and generate multiple pieces of drafts in parallel rapidly, and these drafts will be verified by a large LLM to guarantee the quality. We achieve a higher speed as well as a better quality in the RAG results. View details

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Wenda Xu

Rujun Han

Zifeng Wang

Long Le

Dhruv Madeka

Lei Li

William Wang

Rishabh Agarwal

Chen-Yu Lee

Tomas Pfister

2025

Preview abstract Recent advances in knowledge distillation (KD) have enabled smaller student models to approach the performance of larger teacher models. However, popular methods such as supervised KD and on-policy KD, are adversely impacted by the knowledge gaps between teacher-student in practical scenarios. Supervised KD suffers from a distribution mismatch between training with a static dataset and inference over final student-generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples with which teacher models are not familiar, resulting in inaccurate teacher feedback. To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student’s inference-time distribution. In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution, transferring high-quality knowledge adaptively. We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following, and show that SKD consistently outperforms existing KD methods across different domains, data sizes, and model initialization strategies View details

Multi-turn Function-calling via Graph-based Execution and Translation

Kai-Wei Chang

Hamid Palangi

Tomas Pfister

Chen-Yu Lee

Ke Jiang

Yanfei Chen

I-Hung Hsu

Long Le

Zifeng Wang

Jindong Gu

Jun Yan

Fan Yin

2025

Preview abstract We propose a principled method to synthesize high-quality multi-turn function calling trajectories to align large language model (LLM)-based agents. We start with iteratively building function calling graph and defining node operations to increase its complexity. This enables us to construct reliable reference. Then, based on the synthesized function calling graph, we adopt back-and-forth translation to first construct multi-turn user queries and then, fill in the function arguments with information in the query. We sample positive trajectories that distill the function graph reference and negative trajectories that contrast with the positive trajectories in targeted loss patterns in multi-turn scenarios. Training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, we obtain 67.42 on BFCL and 71.7 on ToolQuery with an open-sourced model with 14B parameters, surpassing the performance of strong proprietary models like o1. View details

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Rujun Han

Tomas Pfister

Chen-Yu Lee

Lei Li

Wenda Xu

Long Le

Rishabh Agarwal

William Wang

Dhruv Madeka

Zifeng Wang

ICLR 2025

Preview abstract Recent knowledge distillation (KD) research made significant progress on improving smaller student models to match larger teachers' performances. Two noticeable methods, supervised KD and on-policy KD emerged as the state-of-the-art approaches. However, supervised KD for auto-regressive models suffers from distribution mismatch between training over fixed dataset and inference over student generated outputs. Conversely, on-policy KD, which uses student-generated samples for training, can suffer from low-quality training examples and the teacher's potential inaccuracies in assessing these samples. To address these limitations, we introduce Speculative Knowledge Distillation (SKD). Instead of solely training on teacher- or student-proposed samples, SKD leverages the student model to initially propose tokens following its own generation distribution. Subsequently, the teacher model is employed to replace tokens that are deemed out-of-distribution. Compared with supervised KD, the samples generated by SKD are more likely to align with the student's inference-time distribution, and 2) SKD can mitigate the generation of low-quality sequences by incorporating the teacher's feedback at each token. Furthermore, we demonstrate that SKD is a generic framework capable of implementing both supervised and on-policy knowledge distillation as specific instances. To validate SKD's effectiveness, we apply it to distill autoregressive large language models for various tasks, including translation, summarization, math, and instruction following. Our experiments consistently demonstrate SKD's superior performance compared to existing methods across different domains, tasks, data sizes, and model initialization strategies. View details

PlanGEN: A Framework Utilizing Inference-Time Algorithms with LLM Agents for Planning and Reasoning

Hamid Palangi

Hootan Nakhost

Mihir Parmar

Tomas Pfister

Chen-Yu Lee

Yanfei Chen

Hossein Mobahi

Long Le

Palash Goyal

Swaroop Mishra

Chitta Baral

Xin Liu

Zifeng Wang

Jindong Gu

2025

Preview abstract Scaling inference-time computation in Large Language Models (LLMs) dramatically improves their capabilities for solving complex problems. While test-time scaling has shown promise in many tasks such as code generation and mathematical reasoning, integration of inference-time algorithms into multi-agent frameworks for planning and reasoning remains under-explored. To this end, we explore popular inference-time algorithms—Best of N, Tree of Thought (ToT), and REward BAlanced SEarch (REBASE)—with proposed feedback-driven refinement. Our feedback-driven refinement employs specialized agents: a constraint agent to enforce task instance-specific constraints, and a verifier agent to evaluate plan quality. Furthermore, we hypothesize that test-time scaling can be proportional to instance-level complexity. Thus, we propose an additional selection agent to dynamically optimize algorithm choice. We evaluate our proposed approaches on four different benchmarks, i.e., NATURAL PLAN, GPQA, OlympiadBench, and DocFinQA. Experimental results show that our methods outperform strong baselines, achieving state-of-the-art results in NATURAL PLAN, OlympiadBench , and DocFinQA. Our key findings demonstrate that constraint-guided iterative refinement and algorithm selection improves both planning and downstream reasoning in LLMs View details

VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation

Mirko Montanari

Hamid Palangi

Lesly Miculicich

Mihir Parmar

Tomas Pfister

Long Le

Dj Dvijotham

2025

Preview abstract Artificial intelligence is rapidly evolving, marked by the emergence of Large Language Model (LLM) agents – systems capable of complex reasoning, planning, and interaction with digital and physical environments. These agents, powered by advancements in LLMs, demonstrate remarkable capabilities across diverse domains, including finance, healthcare, web navigation, software development, and daily task assistance. Unlike traditional AI systems, LLM agents can perceive their surroundings, formulate multi-step plans, utilize external tools and APIs, access memory or knowledge bases, and execute actions to achieve specified goals. This ability to act upon the world, however, introduces significant safety and security challenges. The safety paradigms developed for traditional LLMs, primarily focused on mitigating harmful textual outputs (e.g., toxicity, bias), are insufficient for safeguarding LLM agents. Agents interacting with dynamic environments and executing actions present a broader attack surface and new categories of risk. These include performing unsafe operations, violating privacy constraints through improper data handling or access control failures, deviating from user objectives (task misalignment), and susceptibility to novel manipulation techniques like indirect prompt injection and memory poisoning. Ensuring the trustworthy operation of these powerful agents is paramount, especially as they are integrated into high-stakes applications. To address this critical challenge, we introduce VeriGuard, a novel framework designed to enhance the safety and reliability of LLM agents by interactively verifying their policies and the actions. VeriGuard integrates a verification module that intercepts code-based actions proposed by the agent. In the first step, VeriGuard will generates and verifies the policies. The policies are rigorously checked against a set of predefined safety and security specifications Then each action will be verified to make sure it will align with the agent specification. This interactive verification loop ensures that the agent's behavior remains within safe operational bounds, effectively preventing the execution of harmful or unintended operations. By verifying each step, VeriGuard provides a robust safeguard, substantially improving the trustworthiness of LLM agents in complex, real-world environments. View details

CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

I-Hung Hsu

Zifeng Wang

Long Le

Lesly Miculicich

Nanyun Peng

Chen-Yu Lee

Tomas Pfister

2024

Preview abstract Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs. Larger LM responses that closely align with the smaller LMs' output, which relies exclusively on cited documents, are verified. Responses showing discrepancies are iteratively refined through a feedback loop. Experiments on three open-domain question-answering datasets demonstrate significant performance gains of 1.5% to 7% absolute average without any required model fine-tuning. View details

1
2

of 2

of 2 pages

Search on Google Scholar

Join us

We're always looking for more talented, passionate people.

See opportunities

Follow us

Explore our other initiatives

Google AI

Discover how Google AI is committed to enriching knowledge and solving complex challenges

Products
Build
Research
Responsibility
Societal Impact
About

Google Cloud

High-performance infrastructure for cloud computing, data analytics & machine learning

Overview
Solutions
Products
Pricing
Resources

Google DeepMind

Our mission is to build AI responsibly to benefit humanity

Models
Research
Science
About

Google Labs

Explore the future of AI responsibly with Google Labs

About
Experiments
Stay connected

Google Products

×