Xavi Gonzalvo
Research Areas
Authored Publications
Sort By
AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles
Charles Weill
Vitaly Kuznetsov
Scott Yang
Scott Yak
Hanna Mazzawi
Eugen Hotaj
Ghassen Jerfel
Vladimir Macko
Ben Adlam
(2019)
Preview abstract
AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware. The code is open-source and available at https://github.com/tensorflow/adanet.
View details
AdaNet: Adaptive structural learning of artificial neural networks
Vitaly Kuznetsov
Scott Yang
Proceedings of the 34th International Conference on Machine Learning (ICML 2017). Sydney, Australia, August 2017. (2017)
Preview abstract
We present new algorithms for adaptively learning
artificial neural networks. Our algorithms
(ADANET) adaptively learn both the structure
of the network and its weights. They are
based on a solid theoretical analysis, including
data-dependent generalization guarantees that we
prove and discuss in detail. We report the results
of large-scale experiments with one of our
algorithms on several binary classification tasks
extracted from the CIFAR-10 dataset and on the
Criteo dataset. The results demonstrate that our
algorithm can automatically learn network structures
with very competitive performance accuracies
when compared with those achieved by neural
networks found by standard approaches.
View details
Recent Advances in Google Real-time HMM-driven Unit Selection Synthesizer
Siamak Tazari
Hanna Silen
International Speech Communication Association (ISCA), Sep 8--12, San Francisco, USA, pp. 2238-2242
Preview abstract
This paper presents advances in Google's hidden Markov model (HMM)-driven unit selection speech synthesis system. We describe several improvements to the run-time system; these include minimal
latency, high-quality and fast refresh cycle for new voices. Traditionally unit selection synthesizers are limited in terms of the amount of data they can handle and the real applications they
are built for. That is even more critical for real-life large-scale applications where high-quality is expected and low latency is required given the available computational resources. In this paper we present an optimized engine to handle a large database at runtime, a composite unit search approach for combining diphones and phrase-based units. In addition a new voice building strategy for handling big
databases and keeping the building times low is presented.
View details
Preview abstract
Modern Text-To-Speech (TTS) systems need to increasingly deal with multilingual input. Navigation, social and news are all domains with a large proportion of foreign words. However, when typical monolingual TTS voices are used, the synthesis quality on such input is markedly lower. This is because traditional TTS derives pronunciations from a lexicon or a Grapheme-To-Phoneme (G2P) model which was built using a pre-defined sound inventory and a phonotactic grammar for one language only. G2P models perform poorly on foreign words, while manual lexicon development is labour-intensive, expensive and requires extra storage. Furthermore, large phoneme inventories and phonotactic grammars contribute to data sparsity in unit selection systems. We present an automatic system for deriving pronunciations for foreign words that utilises the monolingual voice design and can rapidly scale to many languages. The proposed system, based on a neural network cross-lingual G2P model, does not increase the size of the voice database, doesn't require large data annotation efforts, is designed not to increase data sparsity in the voice, and can be sized to suit embedded applications.
View details