
Dan Gnanapragasam
Research Areas
Authored Publications
Sort By
Kubric: A scalable dataset generator
Issam Laradji
Tianhao Wu
Florian Golemo
Vincent Sitzmann
Kwang Moo Yi
Derek Nowrouzezahrai
Fangcheng Zhong
Yilun Du
Hsueh-Ti (Derek) Liu
Austin Stone
Henning Meyer
Lucas Beyer
Francois Belletti
Noha Radwan
Daniel Rebain
Cengiz Oztireli
Klaus Greff
Matan Sela
Carl Doersch
Dmitry Lagun
Thomas Kipf
Yishu Miao
Anissa Yuenming Mak
Charles Herrmann
(2022)
Preview abstract
Data is the driving force of machine learning. The amount and quality of training data is often more important for the performance of a system than the details of its architecture. Data is also an important tool for testing specific hypothesis, and for empirically evaluating the behaviour of complex systems. Synthetic data generation represents a powerful tool that can address all these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent privacy and legal concerns. Unfortunately the toolchain for generating data is less well developed than that for building models. We aim to improve this situation by introducing Kubric: a scalable open-source pipeline for generating realistic image and video data with rich ground truth annotations.
We also publish a collection of generated datasets and baseline results on several vision tasks.
View details
A more general method for pronunciation learning
Antoine Bruguier
Interspeech 2017 (2017)
Preview abstract
Automatic speech recognition relies on pronunciation dictionaries for accurate results and previous work used pronunciation learning algorithms to build them. Efficient algorithms must balance having the ability to learn varied pronunciations while being constrained enough to be robust. Our approach extends one of such algorithms \cite{Kou2015} by replacing a finite state transducer (FST) built from a limited-size candidate list with a general and flexible FST building mechanism. This architecture can accommodate a wide variety of pronunciation predictions and can also learn pronunciations without having the written form. It can also use an FST built from a recursive neural network (RNN) and tune the importance given to the written form. The new approach reduces the number of incorrect pronunciations learned by up to 25% (relative) on a random sampling of Google voice traffic
View details