ReasoningBank: Enabling agents to learn from experience
April 21, 2026
Jun Yan and Chen-Yu Lee, Research Scientists, Google Cloud
ReasoningBank is a novel agent memory framework that uses successful and failed experiences to distill generalizable reasoning strategies, enabling an agent to continuously learn from experience after deployment.
Quick links
Agents are becoming increasingly crucial in tackling complex real-world tasks, ranging from general web navigation to assisting with extensive software engineering codebases. However, as these agents transition into persistent, long-running roles in the real world, they face a critical limitation: they struggle to analyze and learn from successful and failed experiences after deployment.
Agents approaching each new task without a memory mechanism will repeatedly make the same strategic errors and discard valuable insights. To address this, various forms of agent memory have been introduced to store information about past interactions for reuse. However, existing methods generally focus on saving exhaustive records of every action taken — such as the trajectory memory used in Synapse — or only documenting workflows summarized from successful attempts, as seen in Agent Workflow Memory). These approaches have two fundamental drawbacks: first, by recording detailed actions instead of tactical foresight, they fail to distill higher-level, transferable reasoning patterns; second, by over-emphasizing successful experiences, they miss out on a primary source of learning — their own failures.
To bridge this gap, in our ICLR paper, "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory", we introduce a novel agent memory framework (github) that distills useful insights from both successful and failed experiences for test-time self-evolution. When evaluated on web browsing and software engineering benchmarks, ReasoningBank enhances both agent effectiveness (higher success rates) and efficiency (fewer task steps) compared to baseline approaches.
Memory content comparison: existing strategies and ReasoningBank.
Distilling insights with ReasoningBank
ReasoningBank distills global reasoning patterns into high-level, structured memories. Each structured memory item contains the following:
- Title: A concise identifier summarizing the core strategy.
- Description: A brief summary of the memory item.
- Content: The distilled reasoning steps, decision rationales, or operational insights extracted from past experiences.
The memory workflow operates in a continuous, closed loop of retrieval, extraction, and consolidation. Before taking action, the agent draws upon the ReasoningBank to gather relevant memories into its context. It then interacts with the environment and uses an LLM-as-a-judge to self-assess the resulting trajectory and extracts success insights or failure reflection. Notably, this self-judgement does not need to be perfectly accurate, as we find ReasoningBank to be quite robust against judgment noise. During extraction, the agent distills workflows and generalizable insights from the trajectory into new memories. For simplicity, we directly append these to the ReasoningBank, leaving more sophisticated consolidation strategies for future work.
Crucially, unlike existing workflow memory strategies that only focus on successful runs, ReasoningBank actively analyzes failed experiences to source counterfactual signals and pitfalls. By distilling these mistakes into preventative lessons, ReasoningBank builds powerful strategic guardrails. For example, instead of merely learning a procedural rule like "click the 'Load More' button”, the agent might learn from a past failure to "always verify the current page identifier first to avoid infinite scroll traps before attempting to load more results”.
Workflow of ReasoningBank integrated with an agent during test time.
Memory-aware test-time scaling (MaTTS)
Test-time scaling (TTS) — scaling compute at inference time — has shown immense effectiveness in reasoning domains like math and competitive programming. However, in agentic environments, existing TTS methods often discard the exploration trajectory and treat the final answer as the only useful outcome. This overlooked exploration is actually a rich data source that could accelerate an agent's ability to learn from experience over time.
We bridge this gap by explicitly linking memory with scaling through memory-aware test-time scaling (MaTTS). By using ReasoningBank as a powerful experience learner, MaTTS distills extensive exploration into high-quality memories via contrastive and refinement signals. We demonstrate the power of MaTTS functions through two distinct forms of scaling:
- Parallel scaling: The agent generates multiple distinct trajectories for the same query under the guidance of memory. Through self-contrast, ReasoningBank compares successful and spuriously reasoned trajectories to distill more robust strategies and synthesize higher-quality memories.
- Sequential scaling: The agent iteratively refines reasoning within a single trajectory to produce strong intermediate rationale. ReasoningBank captures these intermediate insights on the agent's trial-and-errors and progressive improvement as high-quality memory items.
MaTTS establishes a strong synergy: high-quality memory from ReasoningBank steers the scaled exploration towards more promising strategies, and in return, the scaled interactions generate significantly richer learning signals that feed back into an even smarter ReasoningBank to help the agent.
Comparison of memory-aware test-time scaling (MaTTS) with ReasoningBank.
Performance & emergent capabilities
We evaluated ReasoningBank across challenging benchmarks covering dynamic environments. Using the ReAct prompting strategy as the foundation for all agents, we compared ReasoningBank against three memory configurations: a memory-free baseline (Vanilla ReAct), Synapse (Trajectory Memory) and AWM (Workflow Memory). From our main evaluation results with Gemini-2.5-Flash on WebArena and SWE-Bench-Verified, we have the following key observations:
- Superior success rates: ReasoningBank without scaling outperformed memory-free agents by 8.3% on WebArena and 4.6% on SWE-Bench-Verified.
- Efficiency gains: Because the agent actively accesses past decision rationales, it executes commands with vastly reduced aimless exploration. On SWE-Bench-Verified, ReasoningBank saved almost 3 total execution steps per task over memory-free baselines.
- MaTTS synergy: When adding MaTTS (parallel scaling with a scaling factor k=5), success rates are further boosted. ReasoningBank w/ MaTTS improves over ReasoningBank by a 3% success rate increase and 0.4 fewer steps on WebArena.
Performance comparison (task success rates and average steps per task) of different agent memory strategies on WebArena and SWE-Bench-Verified.
Importantly, during evaluation, we observed the emergence of strategic maturity. In a web-browsing example, the agent's initial curated rules resembled simple procedural checklists (e.g., "Look for page links"). As the agent persisted through more problem sets, these memories were incorporated during execution. Building upon existing knowledge, the agent distilled new trajectories into more advanced memories. Over time, simple checklists evolved into memories with compositional, preventative logic structures (e.g., "Cross-reference tasks continuously with active page filters to ensure retrieved datasets aren't paginated prematurely"). See the paper for more details.
Conclusion
ReasoningBank provides a powerful framework for enabling LLMs to learn from experiences and evolve into continuous learners during test-time. We believe memory-driven experience scaling represents a crucial new frontier for agent scaling.
We are excited to share this with the broader research community.
Acknowledgements
This research was conducted by Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister.