Securing Multi-Agent Systems: An Empirical Analysis of Security  Prompt Hardening and Residual Risks

Mithil Roshan; Shashank Nagesh

Securing Multi-Agent Systems: An Empirical Analysis of Security Prompt Hardening and Residual Risks

Mithil Roshan

Shashank Nagesh

(2026)

Google Scholar

Abstract

The rapid adoption of agentic systems powered by
large language models (LLMs) introduces significant
security challenges distinct from plain conversational
models, particularly concerning prompt injection and
tool misuse due to their dynamic personas and real-
world tool interactions. This paper investigates the
effectiveness of hardened security prompting in a
task-oriented multi-agent framework, using a coding
assistant as a representative case study. We com-
pare a baseline ”unhardened” agent against a ”hard-
ened” version equipped with explicit security guide-
lines applied across all sub-agents. Our evaluation
across 150+ single-turn and 32 multi-turn attack sce-
narios demonstrates that prompt hardening dramat-
ically improves resilience. With a simple, approxi-
mately 500-token security hardener, single-turn fail-
ure rates dropped from 19.48% to 2.60%, while multi-
turn failure rates decreased from 75.00% to 46.88%.
Furthermore, we show that successfully bypassing the
hardened agent requires significantly more adversar-
ial effort and a greater number of chat turns. How-
ever, the analysis also reveals a critical shift in vul-
nerability taxonomy: as direct attacks fail, adver-
saries exploit the agent’s core functionality via ”Func-
tional Wrappers” (Intent Obfuscation), highlighting
a residual risk that necessitates a shift in the defen-
sive paradigm from static filters to dynamic runtime
state and intent analysis.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Securing Multi-Agent Systems: An Empirical Analysis of Security Prompt Hardening and Residual Risks

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs