Pedro Gonnet
I did both my undergrad studies in Computer Science, with a Biochemistry minor, as well as my PhD at ETH Zurich, in Switzerland. I spent two years as a Post-Doc in the Mathematical Institute at the University of Oxford, and two years as a lecturer at Durham University, in the UK.
I am currently a Senior Software Engineer at Google. I have worked on melody matching for YouTube's ContentID, and am currently working on handwriting recognition.
Authored Publications
Sort By
IndyLSTMs: Independently Recurrent LSTMs
Thomas Deselaers
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE
Preview abstract
We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state of each LSTM cell depends on the inputs and its own output/state, as opposed to the input and the outputs/states of all the cells in the layer. The number of parameters per IndyLSTM layer, and thus the number of FLOPS per evaluation, is linear in the number of nodes in the layer, as opposed to quadratic for regular LSTM layers, resulting in potentially both smaller and faster models. We evaluate their performance experimentally by training several models on the popular \iamondb and CASIA online handwriting datasets, as well as on several of our in-house datasets. We show that IndyLSTMs, despite their smaller size, consistently outperform regular LSTMs both in terms of accuracy per parameter, and in best accuracy overall. We attribute this improved performance to the IndyLSTMs being less prone to overfitting.
View details
Fast Multi-language LSTM-based Online Handwriting Recognition
Thomas Deselaers
Alexander Daryin
Marcos Calvo
Li-Lun Wang
Sandro Feuz
Philippe Gervais
International Journal on Document Analysis and Recognition (IJDAR) (2020)
Preview abstract
Handwriting is a natural input method for many people and we continuously invest in improving the recognition quality. Here we describe and motivate the modelling and design choices that lead to a significant improvement across the 100 supported languages, based on recurrent neural networks and a variety of language models.
%
This new architecture has completely replaced our previous segment-and-decode system~\cite{Google:HWRPAMI} and reduced the error rate by 30\%-40\% relative for most languages. Further, we report new state-of-the-art results on \iamondb for both the open and closed dataset setting.
%
By using B\'ezier curves for shortening the input length of our sequences we obtain up to 10x faster recognition times. Through a series of experiments we determine what layers are needed and how wide and deep they should be.
%
We evaluate the setup on a number of additional public datasets.
%
View details
SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100000 cores.
PASC16, EPFL, Lausanne, Switzerland (2016)
Preview abstract
We present a new open-source cosmological code, called \swift, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared / distributed-memory architectures. \swift was designed from the bottom up to provide excellent {\em strong scaling} on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches:
\begin{itemize}
\item \textbf{Task-based parallelism} for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores.
\item \textbf{Graph-based domain decomposition}, which uses the task graph to decompose the simulation domain such that the {\em work}, as opposed to just the {\em data}, as is the case with most partitioning schemes, is equally distributed across all nodes.
\item \textbf{Fully dynamic and asynchronous communication}, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferrin on tasks that rely on data from other nodes until it arrives.
\end{itemize}
In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60\% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.
View details
Efficient and Scalable Algorithms for Smoothed Particle Hydrodynamics on Hybrid Shared/Distributed-Memory Architectures
SIAM Journal on Scientific Computing, 37(1) (2015)
Preview abstract
This paper describes a new fast and implicitly parallel approach to neighbour-finding in multi-resolution Smoothed Particle Hydrodynamics (SPH) simulations. This new approach is based on hierarchical cell decompositions and sorted interactions, within a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on hybrid shared/distributed-memory parallel architectures, e.g. clusters of multi-cores, achieving a 40× speedup over the Gadget-2 simulation code.
View details