Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech Representation and Linguistic Features

Yuma Koizumi

Heiga Zen

Shigeki Karita

Yifan Ding

Kohei Yatabe

Nobuyuki Morioka

Yu Zhang

Wei Han

Ankur Bapna

Michiel Adriaan Unico Bacchiani

WASPAA 2023(2023) (to appear)

Download Google Scholar

Abstract

Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech samples collected from the web to studio-quality. To make our SR model robust against various degradation, we use (i) a speech representation extracted from w2v-BERT for the input feature, and (ii) linguistic features extracted from transcripts and PnG-BERT for conditioning features. Experiments show that the proposed model (i) is robust against various audio degradation, (ii) can restore samples in the LJspeech dataset and improves the quality of text-to-speech (TTS) outputs without changing the model and hyper-parameters, and (iii) enable us to train a high-quality TTS model from restored speech samples collected from the web.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech Representation and Linguistic Features

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech Representation and Linguistic Features

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities