ASTRA: Aligning Speech and Text Representations for Asr without Sampling

Andrew Rosenberg
Bhuvana Ramabhadran
Rohan Agrawal
2024
Google Scholar

Abstract

This paper discusses a method to inject text when training an ASR system without the need for up sampling the text sequence to match the length of the speech sequence.