Dan Gnanapragasam

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Kubric: A scalable dataset generator
    Anissa Yuenming Mak
    Austin Stone
    Carl Doersch
    Cengiz Oztireli
    Charles Herrmann
    Daniel Rebain
    Derek Nowrouzezahrai
    Dmitry Lagun
    Fangcheng Zhong
    Florian Golemo
    Francois Belletti
    Henning Meyer
    Hsueh-Ti (Derek) Liu
    Issam Laradji
    Klaus Greff
    Kwang Moo Yi
    Lucas Beyer
    Matan Sela
    Noha Radwan
    Thomas Kipf
    Tianhao Wu
    Vincent Sitzmann
    Yilun Du
    Yishu Miao
    (2022)
    Preview abstract Data is the driving force of machine learning. The amount and quality of training data is often more important for the performance of a system than the details of its architecture. Data is also an important tool for testing specific hypothesis, and for empirically evaluating the behaviour of complex systems. Synthetic data generation represents a powerful tool that can address all these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent privacy and legal concerns. Unfortunately the toolchain for generating data is less well developed than that for building models. We aim to improve this situation by introducing Kubric: a scalable open-source pipeline for generating realistic image and video data with rich ground truth annotations. We also publish a collection of generated datasets and baseline results on several vision tasks. View details
    Preview abstract Automatic speech recognition relies on pronunciation dictionaries for accurate results and previous work used pronunciation learning algorithms to build them. Efficient algorithms must balance having the ability to learn varied pronunciations while being constrained enough to be robust. Our approach extends one of such algorithms \cite{Kou2015} by replacing a finite state transducer (FST) built from a limited-size candidate list with a general and flexible FST building mechanism. This architecture can accommodate a wide variety of pronunciation predictions and can also learn pronunciations without having the written form. It can also use an FST built from a recursive neural network (RNN) and tune the importance given to the written form. The new approach reduces the number of incorrect pronunciations learned by up to 25% (relative) on a random sampling of Google voice traffic View details