Towards targeted audio-agnostic adversarial attacks to end-to-end ASR models

Liangliang Cao; Wei Han; Yu Zhang; Zhiyun Lu

Towards targeted audio-agnostic adversarial attacks to end-to-end ASR models

Liangliang Cao

Wei Han

Yu Zhang

Zhiyun Lu

Interspeech'2021

Google Scholar

Abstract

The adversarial attack is a popular topic in computer vision and deep learning communities, but there are fewer studies on how model automatic speech recognition models may be affected by adversarial attacks. In this paper, we study targeted audio-agnostic adversarial attacks to various end-to-end ASR models trained on Librispeech. We find universal perturbation vectors exist that can mislead the ASR model to output the given transcript target when applied on arbitrary utterances, even unseen ones. We propose a learning-based algorithm to generate such adversarial attacks and study its performances on LAS, RNN-T and CTC models. We find that LAS is the most vulnerable among the three models to the attack. On RNN-T, it is challenging to attack long utterances when the perturbation is additive noise. To this end, we propose a new perturbation pattern, which prepends frames before the utterance. Prepending perturbation can fail utterances of arbitrary length on RNN-T, and is shown to be more effective than the previously studied additive perturbation in general in generating audio-agnostic attacks. CTC is robust to audio-agnostic adversarial examples when the perturbation is of fixed length in our study.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Towards targeted audio-agnostic adversarial attacks to end-to-end ASR models

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs