Privacy-preserved LLM Cascade via CoT-enhanced Policy Learning

Xiaozhong Liu
Kai Zhang
Congchao Wang
Liqian Peng
2025
Google Scholar

Abstract

Large Language Models (LLMs) have seen increasing attentions in on-device applications due to their exceptional ability in real-world tasks. However, device-end LLM often performs suboptimal due to the hardware limitation. Cascading local (on-device) weaker and server stronger LLMs presents a promising solution to this challenge. While existing research on LLM cascade primarily focuses on optimizing the performance-cost trade-off, privacy concerns remain largely unaddressed. In this work, we prioritize privacy-preserved LLM cascading while enhancing cascade efficiency. To this end, we propose a novel CoT-enhanced policy learning strategy for deferral decision-making, which accounts for both performance-cost trade-offs and privacy considerations. Extensive experiments on three benchmark datasets validate the effectiveness and superiority of our approach.