Omry Tuval
Omry Tuval is currently at Google Research, working on machine learning from audio and human voice.
Prior to that, Omry was an engineering tech lead for Google's business intelligence and data analytics systems and tool.
In previous lives, Omry was a project manager of critical data security projects, a software security researcher and, for a short while, had a great time being a Sales Engineer for a startup in the water distribution sector.
Authored Publications
Sort By
Towards Learning a Universal Non-Semantic Representation of Speech
Joel Shor
Ronnie Zvi Maor
Ira Shavitt
Proc. Interspeech 2020 (2020)
Preview abstract
The ultimate goal of transfer learning is to enable learning with a small amount of data, by using a strong embedding. While significant progress has been made in the visual and language domains, the speech domain does not have such a universal method. This paper presents a new representation of speech signals based on an unsupervised triplet-loss objective, which outperforms both existing state of the art and other representations on a number of transfer learning tasks in the non-semantic speech domain. The embedding is learned on a publicly available dataset, and it is tested on a variety of low-resource downstream tasks, including personalization tasks and medical domain. The model will be publicly released.
View details
Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Joel Shor
Interspeech 2019 (2019)
Preview abstract
Automatic speech recognition (ASR) systems have dramatically
improved over the last few years. ASR systems are most often trained from ‘typical’ speech, which means that underrepresented groups don’t experience the same level of improvement.
In this paper, we present and evaluate finetuning techniques to
improve ASR for users with non standard speech. We focus
on two types of non standard speech: speech from people with
amyotrophic lateral sclerosis (ALS) and accented speech. We
train personalized models that achieve 62% and 35% relative
WER improvement on these two groups, bringing the absolute
WER for ALS speakers, on a test set of message bank phrases,
to 10% for mild dysarthria and 20% for more serious dysarthria.
We show that 76% of the improvement comes from only 5 min
of training data. Finetuning a particular subset of layers (with
many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state
of the art ASR models for dysarthric speech
Index Terms: speech recognition, personalization, accessibility
View details
Joint Cache Partition and Job Assignment on Multi-Core Processors
WADS'13: Proceedings of the 13th international conference on Algorithms and Data Structures (2012)