Jeremiah J. Harmsen

Jeremiah J. Harmsen

Jeremiah Harmsen joined Google in 2005 where he has founded efforts such as TensorFlow Hub, TensorFlow Serving and the Machine Learning Ninja Rotation. He focuses on creating the ideas, tools and people to help the world use machine learning. He currently leads the Applied Machine Intelligence group at Google AI Zurich. The team increases the impact of machine learning through consultancy, state-of-the-art infrastructure development, research and education. Jeremiah received a B.S. degree in electrical engineering and computer engineering (2001), a M.S. degree in electrical engineering (2003), a M.S. degree in mathematics (2005) and a Ph.D. in electrical engineering (2005) from Rensselaer Polytechnic Institute, Troy, NY. Jeremiah lives by the lake with his wife, son and daughter in Zurich, Switzerland.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Scaling Vision Transformers to 22 Billion Parameters
    Josip Djolonga
    Basil Mustafa
    Piotr Padlewski
    Justin Gilmer
    Mathilde Caron
    Rodolphe Jenatton
    Lucas Beyer
    Michael Tschannen
    Anurag Arnab
    Carlos Riquelme
    Matthias Minderer
    Gamaleldin Elsayed
    Fisher Yu
    Avital Oliver
    Fantine Huot
    Mark Collier
    Vighnesh Birodkar
    Yi Tay
    Alexander Kolesnikov
    Filip Pavetić
    Thomas Kipf
    Xiaohua Zhai
    Neil Houlsby
    Arxiv (2023)
    Preview abstract The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modeling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters. We present a recipe for highly efficient training of a 22B-parameter ViT and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features) ViT22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between bias and performance, an improved alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT22B demonstrates the potential for "LLM-like'' scaling in vision, and provides key steps towards getting there. View details
    TensorFlow-Serving: Flexible, High-Performance ML Serving
    Christopher Olston
    Fangwei Li
    Jordan Soyke
    Kiril Gorovoy
    Li Lao
    Sukriti Ramesh
    Vinu Rajashekhar
    Workshop on ML Systems at NIPS 2017
    Preview abstract We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. The paper covers the architecture of the extensible serving library, as well as the distributed system for multi-tenant model hosting. Along the way it points out which extensibility points and performance optimizations turned out to be especially important based on production experience. View details
    Wide & Deep Learning for Recommender Systems
    Levent Koc
    Tal Shaked
    Glen Anderson
    Wei Chai
    Mustafa Ispir
    Rohan Anil
    Zakaria Haque
    Lichan Hong
    Vihan Jain
    Xiaobing Liu
    Hemal Shah
    arXiv:1606.07792 (2016)
    Preview abstract Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. View details
    Up Next: Retrieval Methods for Large Scale Related Video Suggestion
    Lluis Garcia Pueyo
    Vanja Josifovski
    Dima Lepikhin
    Proceedings of KDD 2014, New York, NY, USA, pp. 1769-1778
    Preview abstract The explosive growth in sharing and consumption of the video content on the web creates a unique opportunity for scientific advances in video retrieval, recommendation and discovery. In this paper, we focus on the task of video suggestion, commonly found in many online applications. The current state-of-the-art video suggestion techniques are based on the collaborative filtering analysis, and suggest videos that are likely to be co-viewed with the watched video. In this paper, we propose augmenting the collaborative filtering analysis with the topical representation of the video content to suggest related videos. We propose two novel methods for topical video representation. The fi rst method uses information retrieval heuristics such as tf-idf, while the second method learns the optimal topical representations based on the implicit user feedback available in the online scenario. We conduct a large scale live experiment on YouTube traffic, and demonstrate that augmenting collaborative filtering with topical representations significantly improves the quality of the related video suggestions in a live setting, especially for categories with fresh and topically-rich video content such as news videos. In addition, we show that employing user feedback for learning the optimal topical video representations can increase the user engagement by more than 80% over the standard information retrieval representation, when compared to the collaborative filtering baseline. View details
    Capacity of Steganographic Channels
    William Pearlman
    IEEE Transactions on Information Theory, 55 (2009), pp. 1775-1792
    Preview abstract This work investigates a central problem in steganography, that is: How much data can safely be hidden without being detected? To answer this question, a formal definition of steganographic capacity is presented. Once this has been defined, a general formula for the capacity is developed. The formula is applicable to a very broad spectrum of channels due to the use of an information-spectrum approach. This approach allows for the analysis of arbitrary steganalyzers as well as nonstationary, nonergodic encoder and attack channels. After the general formula is presented, various simplifications are applied to gain insight into example hiding and detection methodologies. Finally, the context and applications of the work are summarized in a general discussion. View details