- Yannis Agiomyrgiannakis
This paper presents a mathematical framework that is suitable for voice conversion and adaptation in speech processing. Voice conversion is formulated as a search for the optimal correspondances between a set of source-speaker spectra and a set of target-speaker spectra under a transform that compensates speaker differences. It is possible to simultaneously recover a bi-directional mapping between two sets of vectors that is a parametric mapping (a transform) in one direction and a non-parametric mapping (correspondences) in the reverse direction. An algorithm referred to as Matching-Minimization (MM) is formally derived with proven convergence and an optimal closed-form solution for each step. The algorithm is closely related to the asymmetric-1 variant of the well-known INCA algorithm  for which we also provide a proof within the same framework. The differences between MM and INCA are delineated both theoretically and experimentally. MM outperforms INCA in all scenarios. Like INCA, MM does not require parallel corpora. Unlike INCA, MM is suitable when only a few adaptation data are available.