Kai Kohlhoff
Kai Kohlhoff studied computer science, computational biology and structural bioinformatics at the Karlsruhe Institute of Technology (KIT), Jacobs University Bremen, and the University of Cambridge. After finishing his PhD, he was a Simbios Distinguished Postdoctoral Fellow in Bioengineering at Stanford University. Kai joined Google as a Visiting Faculty in 2011 and is now working as a research scientist at Google AI.
Authored Publications
Sort By
Rich Human Feedback for Text to Image Generation
Katherine Collins
Nicholas Carolan
Youwei Liang
Peizhao Li
Dj Dvijotham
Gang Li
Sarah Young
Jiao Sun
Arseniy Klimovskiy
Preview abstract
Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality.
Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior work collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation.
In this paper, we enrich the feedback signal by (i) marking image regions that are implausible or misaligned with the text, and (ii) annotating which keywords in the text prompt are not represented in the image.
We collect such rich human feedback on 18K generated images and train a multimodal transformer to predict these rich feedback automatically.
We show that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions.
Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants).
View details
Accelerating eye movement research via accurate and affordable smartphone eye tracking
Na Dai
Ethan Steinberg
Kantwon Rogers
Venky Ramachandran
Mina Shojaeizadeh
Li Guo
Nature Communications, 11 (2020)
Preview abstract
Eye tracking has been widely used for decades in vision research, language and usability. However, most prior research has focused on large desktop displays using specialized eye trackers that are expensive and cannot scale. Little is known about eye movement behavior on phones, despite their pervasiveness and large amount of time spent. We leverage machine learning to demonstrate accurate smartphone-based eye tracking without any additional hardware. We show that the accuracy of our method is comparable to state-of-the-art mobile eye trackers that are 100x more expensive. Using data from over 100 opted-in users, we replicate key findings from previous eye movement research on oculomotor tasks and saliency analyses during natural image viewing. In addition, we demonstrate the utility of smartphone-based gaze for detecting reading comprehension difficulty. Our results show the potential for scaling eye movement research by orders-of-magnitude to thousands of participants (with explicit consent), enabling advances in vision research, accessibility and healthcare.
View details
Google-Accelerated Biomolecular Simulations
Biomolecular Simulations, Springer, New York (2019)
Preview abstract
Biomolecular simulations rely heavily on the availability of suitable compute infrastructure for data-driven tasks like modeling, sampling, and analysis. These resources are typically available on a per-lab and per-facility basis, or through dedicated national supercomputing centers. In recent years, cloud computing has emerged as an alternative by offering an abundance of on-demand, specialist-maintained resources that enable efficiency and increased turnaround through rapid scaling.
Scientific computations that take the shape of parallel workloads using large datasets are commonplace, making them ideal candidates for distributed computing in the cloud. Recent developments have greatly simplified the task for the experimenter to configure the cloud for use and job submission. This chapter will show how to use Google’s Cloud Platform for biomolecular simulations by example of the molecular dynamics package GROningen MAchine for Chemical Simulations (GROMACS). The instructions readily transfer to a large variety of other tasks, allowing the reader to use the cloud for their specific purposes.
Importantly, by using Docker containers (a popular light-weight virtualization solution) and cloud storage, key issues in scientific research are addressed: reproducibility of results, record keeping, and the possibility for other researchers to obtain copies and directly build upon previous work for further experimentation and hypothesis testing.
View details
Preview abstract
Given an object and a hand, identifying a robust grasp out of an infinite set of grasp candidates is a challenging problem, and several grasp synthesis approaches have been proposed in the robotics community to find the promising ones. Most of the approaches assume both the object and the hand to be rigid and evaluate the robustness of the grasp based on the wrenches acting at contact points. Since rigid body mechanics is used in these works, the actual distribution of the contact tractions is not considered, and contacts are represented by their resultant wrenches. However, the tractions acting at the contact interfaces play a critical role in the robustness of the grasp, and not accounting for these in detail is a serious limitation of the current approaches. In this paper, we replace the conventional wrench-based rigid-body approaches with a deformable-body mechanics formulation as is conventional in solid mechanics. We briefly review the wrench-based grasp synthesis approaches in the literature and address the drawbacks present in them from a solid mechanics standpoint. In our formulation, we account for deformation in both the grasper and the object and evaluate the robustness of grasp based on the distribution of normal and tangential tractions at the contact interface. We contrast how a given grasp situation is solved using conventional wrench space formulations and deformable solid mechanics and show how tractions on the contacting surfaces influence the grasp equilibrium. Recognizing that contact areas can be correlated to contact tractions, we propose a grasp performance index, π , based on the contact areas. We also devise a grasp analysis strategy to identify robust grasps under random perturbations and implement it using Finite Element Method (FEM) to study a few grasps. One of the key aspects of our Finite Element (FE)-based approach is that it can be used to monitor the dynamic interaction between object and hand for judging grasp robustness. We then compare our measure, π , with conventional grasp quality measures, ϵ and v and show that it successfully accounts for the effect of the physical characteristics of the object and hand (such as the mass, Young’s modulus and coefficient of friction) and identifies robust grasps that are in line with human intuition and experience.
View details
Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds
Nathaniel Cabot Thomas
Tess Smidt
Steven Kearnes
Li Li
Patrick Riley
(2018)
Preview abstract
We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as input (and guarantees as output) scalars, vectors, and higher-order tensors, in the geometric sense of these terms. We demonstrate the capabilities of tensor field networks with tasks in geometry, physics, and chemistry.
View details
A Data-Driven Large-Scale Optimization Approach for Task-Specific Physics Realism in Real-Time Robotics Simulation
Andreas Bihlmaier
2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (2016)
Preview abstract
Physics-based simulation of robots requires mod-
els of the simulated robots and their environment. For a realistic
simulation behavior, these models must be accurate. Their
physical properties such as geometric and kinematic values,
as well as dynamic parameters such as mass, inertia matrix
and friction, must be modelled. Unfortunately, this problem is
hard for at least two reasons. First, physics engines designed
for simulation of rigid bodies in real-time cannot accurately
describe many common real world phenomena, e.g. (drive)
friction and grasping. Second, classical parameter identification
algorithms are well-studied and efficient, but often necessitate
significant manual engineering effort and may not be applicable
due to application constraints. Thus, we present a data-
driven general purpose tool, which allows to optimize model
parameters for (task-specific) realistic simulation behavior. Our
approach directly uses the simulator and the model under
optimization to improve model parameters. The optimization
process is highly distributed and uses a hybrid optimization
approach based on metaheuristics and the Ceres non-linear
least squares solver. The user only has to provide a configuration
file that specifies which model parameter to optimize together
with realism criteri
View details
Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards
Jeffrey Mahler
Florian T. Pokorny
Brian Hou
Melrose Roderick
Michael Laskey
Mathieu Aubry
Torsten Kroeger
James Kuffner
Ken Goldberg
2016 IEEE International Conference on Robotics and Automation (ICRA) (2016)
Preview abstract
This paper presents the Dexterity Network (Dex-Net) 1.0, a dataset of 3D object models and a sampling-based planning algorithm to explore how Cloud Robotics can be used for robust grasp planning. The algorithm uses a Multi- Armed Bandit model with correlated rewards to leverage prior grasps and 3D object models in a growing dataset that currently includes over 10,000 unique 3D object models and 2.5 million parallel-jaw grasps. Each grasp includes an estimate of the probability of force closure under uncertainty in object and gripper pose and friction. Dex-Net 1.0 uses Multi-View Convolutional Neural Networks (MV-CNNs), a new deep learning method for 3D object classification, to provide a similarity metric between objects, and the Google Cloud Platform to simultaneously run up to 1,500 virtual cores, reducing experiment runtime by up to three orders of magnitude. Experiments suggest that correlated bandit techniques can use a cloud-based network of object models to significantly reduce the number of samples required for robust grasp planning. We report on system sensitivity to variations in similarity metrics and in uncertainty in pose and friction. Code and updated information is available at http://berkeleyautomation.github.io/dex-net/.
View details
Cloud-based simulations on Google Exacycle reveal ligand modulation of GPCR activation pathways
Diwakar Shukla
Morgan Lawrenz
Gregory Bowman
David Konerding
Dan Belov
Russ Altman
Vijay Pande
Nature Chemistry, 6 (2014), 15–21
Preview abstract
Simulations can provide tremendous insight into the atomistic details of biological mechanisms, but micro- to millisecond timescales are historically only accessible on dedicated supercomputers. We demonstrate that cloud computing is a viable alternative that brings long-timescale processes within reach of a broader community. We used Google's Exacycle cloud-computing platform to simulate two milliseconds of dynamics of a major drug target, the G-protein-coupled receptor β2AR. Markov state models aggregate independent simulations into a single statistical model that is validated by previous computational and experimental results. Moreover, our models provide an atomistic description of the activation of a G-protein-coupled receptor and reveal multiple activation pathways. Agonists and inverse agonists interact differentially with these pathways, with profound implications for drug design.
View details
The wisdom of clouds
Preview
Chemistry World, 11 (2014), pp. 38