Dan Gnanapragasam
Research Areas
Authored Publications
Sort By
Kubric: A scalable dataset generator
Anissa Yuenming Mak
Austin Stone
Carl Doersch
Cengiz Oztireli
Charles Herrmann
Daniel Rebain
Derek Nowrouzezahrai
Dmitry Lagun
Fangcheng Zhong
Florian Golemo
Francois Belletti
Henning Meyer
Hsueh-Ti (Derek) Liu
Issam Laradji
Klaus Greff
Kwang Moo Yi
Lucas Beyer
Matan Sela
Noha Radwan
Thomas Kipf
Tianhao Wu
Vincent Sitzmann
Yilun Du
Yishu Miao
(2022)
Preview abstract
Data is the driving force of machine learning. The amount and quality of training data is often more important for the performance of a system than the details of its architecture. Data is also an important tool for testing specific hypothesis, and for empirically evaluating the behaviour of complex systems. Synthetic data generation represents a powerful tool that can address all these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent privacy and legal concerns. Unfortunately the toolchain for generating data is less well developed than that for building models. We aim to improve this situation by introducing Kubric: a scalable open-source pipeline for generating realistic image and video data with rich ground truth annotations.
We also publish a collection of generated datasets and baseline results on several vision tasks.
View details
A more general method for pronunciation learning
Antoine Bruguier
Interspeech 2017 (2017)
Preview abstract
Automatic speech recognition relies on pronunciation dictionaries for accurate results and previous work used pronunciation learning algorithms to build them. Efficient algorithms must balance having the ability to learn varied pronunciations while being constrained enough to be robust. Our approach extends one of such algorithms \cite{Kou2015} by replacing a finite state transducer (FST) built from a limited-size candidate list with a general and flexible FST building mechanism. This architecture can accommodate a wide variety of pronunciation predictions and can also learn pronunciations without having the written form. It can also use an FST built from a recursive neural network (RNN) and tune the importance given to the written form. The new approach reduces the number of incorrect pronunciations learned by up to 25% (relative) on a random sampling of Google voice traffic
View details