Henry A. Rowley

Henry A. Rowley

Henry A. Rowley received BS degrees in Electrical Engineering and Computer Science from the University of Minnesota in 1992, a Masters in Computer Science from Carnegie Mellon University in 1994, and PhD in Computer Science from Carnegie Mellon University in 1999 for his thesis work on neural network-based face detection. After graduating he worked at Zaxel Systems, Inc. on lossless video compression and multi-view stereo reconstruction, and at Microsoft on Chinese, Japanese, and Korean handwriting recognition. Currently he is a member of the Google Research group, where he has worked on computer vision, machine learning, and most recently handwriting recognition.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Fast Multi-language LSTM-based Online Handwriting Recognition
    Thomas Deselaers
    Alexander Daryin
    Marcos Calvo
    Li-Lun Wang
    Sandro Feuz
    Philippe Gervais
    International Journal on Document Analysis and Recognition (IJDAR)(2020)
    Preview abstract Handwriting is a natural input method for many people and we continuously invest in improving the recognition quality. Here we describe and motivate the modelling and design choices that lead to a significant improvement across the 100 supported languages, based on recurrent neural networks and a variety of language models. % This new architecture has completely replaced our previous segment-and-decode system~\cite{Google:HWRPAMI} and reduced the error rate by 30\%-40\% relative for most languages. Further, we report new state-of-the-art results on \iamondb for both the open and closed dataset setting. % By using B\'ezier curves for shortening the input length of our sequences we obtain up to 10x faster recognition times. Through a series of experiments we determine what layers are needed and how wide and deep they should be. % We evaluate the setup on a number of additional public datasets. % View details
    Multi-Language Online Handwriting Recognition
    Thomas Deselaers
    Li-Lun Wang
    IEEE Transactions on Pattern Analysis and Machine Intelligence(2016)
    Preview abstract We describe Google's online handwriting recognition system that currently supports 22 scripts and 97 languages. The system's focus is on fast, high-accuracy text entry for mobile, touch-enabled devices. We use a combination of state-of-the-art components and combine them with novel additions in a flexible framework. This architecture allows us to easily transfer improvements between languages and scripts. This made it possible to build recognizers for languages that, to the best of our knowledge, are not handled by any other online handwriting recognition system. The approach also enabled us to use the same architecture both on very powerful machines for recognition in the cloud as well as on mobile devices with more limited computational power by changing some of the settings of the system. In this paper we give a general overview of the system architecture and the novel components, such as unified time- and position-based input interpretation, trainable segmentation, minimum-error rate training for feature combination, and a cascade of pruning strategies. We present experimental results for different setups. The system is currently publicly available in several Google products, for example in Google Translate and as an input method for Android devices. View details
    GyroPen: Gyroscopes for Pen-input with Mobile Phones
    Thomas Deselaers
    Jan Hosang
    IEEE Transactions on Human-Machine Systems, 45(2015), pp. 263-271
    Preview abstract We present GyroPen, a method for text entry into mobile devices using pen-like writing interaction reconstructed from standard built-in sensors. The key idea is to reconstruct a representation of the trajectory of the phone's corner that is touching a writing surface from the measurements obtained from the phone's gyroscopes and accelerometers. We propose to directly use the angular trajectory for this reconstruction, which removes the necessity for accurate absolute 3D position estimation, a task that can be difficult using low-cost accelerometers. Recognition is then performed using an off-the-shelf handwriting recognition system, allowing easy extension to new languages and scripts. In a small user study (n=10), the average novice participant was able to write the first word only 37 seconds after the starting to use GyroPen for the first time. With some experience, users were able to write at the speed of 3-4s for one English word and with a character error rate of 18%. View details
    Learning Binary Codes for High Dimensional Data Using Bilinear Projections
    Yunchao Gong
    Svetlana Lazebnik
    IEEE Computer Vision and Pattern Recognition(2013)
    Preview abstract Recent advances in visual recognition indicate that to achieve good retrieval and classification accuracy on large scale datasets like ImageNet, extremely high-dimensional visual descriptors, e.g., Fisher Vectors, are needed. We present a novel method for converting such descriptors to compact similarity-preserving binary codes that exploits their natural matrix structure to reduce their dimensionality using compact bilinear projections instead of a single large projection matrix. This method achieves comparable retrieval and classification accuracy to the original descriptors and to the state-of-the-art Product Quantization approach while having orders of magnitude faster code generation time and smaller memory footprint. View details
    Preview abstract This paper presents the algorithms which power Google Correlate, a tool which finds web search terms whose popularity over time best matches a user-provided time series. Correlate was developed to generalize the query-based modeling techniques pioneered by Google Flu Trends and make them available to end users. Correlate searches across millions of candidate query time series to find the best matches, returning results in less than 200 milliseconds. Its feature set and requirements present unique challenges for Approximate Nearest Neighbor (ANN) search techniques. In this paper, we present Asymmetric Hashing (AH), the technique used by Correlate, and show how it can be adapted to the specific needs of the product. We then develop experiments to test the throughput and recall of Asymmetric Hashing as compared to a brute-force search. For "full" search vectors, we achieve a 10x speedup over brute force search while maintaining 97% recall. For search vectors which contain holdout periods, we achieve a 4x speedup over brute force search, also with 97% recall. View details
    Large-scale SVD and manifold learning
    Ameet Talwalkar
    Journal of Machine Learning Research, 14(2013), pp. 3129-3152
    Preview
    Google Image Swirl: A Large-Scale Content-Based Image Visualization System
    Yushi Jing
    Jingbin Wang
    David Tsai
    Chuck Rosenberg
    Michele Covell
    WWW(2012), pp. 539-540
    Preview
    Large-Scale Image Annotation using Visual Synset
    David Tsai
    Yushi Jing
    Yi Liu
    Sergey Ioffe
    James Rehg
    Proc. International Conference on Computer Vision (ICCV)(2011)
    Preview
    Image Saliency: From Local to Global Context
    Meng Wang
    Janusz Konrad
    Prakash Ishwar
    Yushi Jing
    Proc. Conference on Computer Vision and Pattern Recognition (CVPR)(2011)
    Preview
    Comparison of Clustering Approaches for Summarizing Large Populations of Images
    Yushi Jing
    Michele Covell
    Proceedings ICME VCIDS, IEEE, Singapore(2010)
    Preview abstract This paper compares the efficacy and efficiency of different clustering approaches for selecting a set of exemplar images, to present in the context of a semantic concept. We evaluate these approaches using 900 diverse queries, each associated with 1000 web images, and comparing the exemplars chosen by clustering to the top 20 images for that search term. Our results suggest that Affinity Propagation is effective in selecting exemplars that match the top search images but at high computational cost. We improve on these early results using a simple distribution-based selection filter on incomplete clustering results. This improvement allows us to use more computationally efficient approaches to clustering, such as Hierarchical Agglomerative Clustering (HAC) and Partitioning Around Medoids (PAM), while still reaching the same (or better) quality of results as were given by Affinity Propagation in the original study. The computational savings is significant since these alternatives are 7-27 times faster than Affinity Propagation. View details
    Visualizing Web Images via Google Image Swirl
    Yushi Jing
    Chuck Rosenberg
    Jingbin Wang
    Michele Covell
    NIPS Workshop on Statistical Machine Learning for Visual Analytics(2009)
    Preview
    Face Tracking and Recognition with Visual Constraints in Real-World Videos
    Minyoung Kim
    Vladimir Pavlovic
    IEEE Computer Vision and Pattern Recognition (CVPR)(2008)
    Preview abstract We address the problem of tracking and recognizing faces in real-world, noisy videos. We track faces using a tracker that adaptively builds a target model reflecting changes in appearance, typical of a video setting. However, adaptive appearance trackers often suffer from drift, a gradual adaptation of the tracker to non-targets. To alleviate this problem, our tracker introduces visual constraints using a combination of generative and discriminative models in a particle filtering framework. The generative term conforms the particles to the space of generic face poses while the discriminative one ensures rejection of poorly aligned targets. This leads to a tracker that significantly improves robustness against abrupt appearance changes and occlusions, critical for the subsequent recognition phase. Identity of the tracked subject is established by fusing pose-discriminant and person-discriminant features over the duration of a video sequence. This leads to a robust video-based face recognizer with state-of-the-art recognition performance. We test the quality of tracking and face recognition on realworld noisy videos from YouTube as well as the standard Honda/UCSD database. Our approach produces successful face tracking results on over 80% of all videos without video or person-specific parameter tuning. The good tracking performance induces similarly high recognition rates: 100% on Honda/UCSD and over 70% on the YouTube set containing 35 celebrities in 1500 sequences. View details
    Large-Scale Manifold Learning
    Ameet Talwalkar
    Computer Vision and Pattern Recognition (CVPR)(2008)
    Preview abstract This paper examines the problem of extracting low-dimensional manifold structure given millions of high-dimensional face images. Specifically, we address the computational challenges of nonlinear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. Since most manifold learning techniques rely on spectral decomposition, we first analyze two approximate spectral decomposition techniques for large dense matrices (Nystrom and Column-sampling), providing the first direct theoretical and empirical comparison between these techniques. We next show extensive experiments on learning low-dimensional embeddings for two large face datasets: CMU-PIE (35 thousand faces) and a web dataset (18 million faces). Our comparisons show that the Nystrom approximation is superior to the Column-sampling method. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classification with the labeled CMU-PIE dataset. View details
    Canonical Image Selection from the Web
    Yushi Jing
    Shumeet Baluja
    ACM International Conference on Image and Video Retrieval(2007)
    Preview
    Boosting Sex Identification Performance
    Shumeet Baluja
    International Journal of Computer Vision, 71(2007), pp. 111-119
    Preview
    Clustering Billions of Images with Large Scale Nearest Neighbor Search
    Ting Liu
    Chuck Rosenberg
    IEEE Workshop on Applications of Computer Vision, IEEE(2007)
    Preview
    Large Scale Image-Based Adult-Content Filtering
    Yushi Jing
    Shumeet Baluja
    1st International Conference on Computer Vision Theory, Sebutal, Portugal(2006)
    Preview
    Large Scale Performance Measurement of Content-Based Automated Image-Orientation Detection
    Shumeet Baluja
    International Conference on Image Processing, Genova, Italy(2005)
    Preview
    Boosting Sex Identification Performance
    Shumeet Baluja
    Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference, AAAI(2005), pp. 1508-1513
    Preview
    The Happy Searcher: Challenges in Web Information Retrieval
    Mehran Sahami
    Vibhu Mittal
    Shumeet Baluja
    The Eighth Pacific Rim International Conference on Artificial Intelligence (PRICAI-2004)
    Preview
    Efficient Face Orientation Discrimination
    Shumeet Baluja
    Mehran Sahami
    International Conference on Image Processing (ICIP-2004)
    Preview
    The Effect of Large Training Set Sizes on Online Japanese Kanji and English Cursive Recognizers
    Manish Goyal
    John Bennett
    IWFHR '02: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02), IEEE Computer Society, Washington, DC, USA(2002), pp. 36-40
    Anomaly detection through registration
    Mei Chen
    Takeo Kanade
    Dean Pomerleau
    Pattern Recognition, 32(1999), pp. 113-128
    Neural Network-Based Face Detection
    Shumeet Baluja
    Takeo Kanade
    IEEE Trans. Pattern Anal. Mach. Intell., 20(1998), pp. 23-38
    Rotation Invariant Neural Network-Based Face Detection
    Shumeet Baluja
    Takeo Kanade
    CVPR(1998), pp. 38-44
    Anomaly Detection through Registration
    Mei Chen
    Takeo Kanade
    Dean Pomerleau
    CVPR(1998), pp. 304-310
    Integrating Text and Face Detection for Finding Informative Poster Frames
    Michael Smith
    Shumeet Baluja
    AAAI Spring Symposium(1997), pp. 95-101
    Analyzing Articulated Motion Using Expectation-Maximization
    James M. Rehg
    CVPR(1997), pp. 935-
    Neural Network-Based Face Detection
    Shumeet Baluja
    Takeo Kanade
    CVPR(1996), pp. 203-208
    Human Face Detection in Visual Scenes
    Shumeet Baluja
    Takeo Kanade
    NIPS(1995), pp. 875-881
    Reconstructing 3-D Blood Vessel Shapes from Multiple X-Ray Images
    Takeo Kanade
    AAAI Spring Symposium(1994)
    Case Study of a Population Bottleneck: Lions of the Ngorongoro Crater
    C. Packer
    A. E. Pusey
    D. A. Gilbert
    J. Martenson
    S. J. O'Brien
    Conservation Biology, 5(1991)