Kai Kohlhoff

Kai Kohlhoff

Kai Kohlhoff studied computer science, computational biology and structural bioinformatics at the Karlsruhe Institute of Technology (KIT), Jacobs University Bremen, and the University of Cambridge. After finishing his PhD, he was a Simbios Distinguished Postdoctoral Fellow in Bioengineering at Stanford University. Kai joined Google as a Visiting Faculty in 2011 and is now working as a research scientist at Google AI.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality. Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior work collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation. In this paper, we enrich the feedback signal by (i) marking image regions that are implausible or misaligned with the text, and (ii) annotating which keywords in the text prompt are not represented in the image. We collect such rich human feedback on 18K generated images and train a multimodal transformer to predict these rich feedback automatically. We show that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions. Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants). View details
    Preview abstract Eye tracking has been widely used for decades in vision research, language and usability. However, most prior research has focused on large desktop displays using specialized eye trackers that are expensive and cannot scale. Little is known about eye movement behavior on phones, despite their pervasiveness and large amount of time spent. We leverage machine learning to demonstrate accurate smartphone-based eye tracking without any additional hardware. We show that the accuracy of our method is comparable to state-of-the-art mobile eye trackers that are 100x more expensive. Using data from over 100 opted-in users, we replicate key findings from previous eye movement research on oculomotor tasks and saliency analyses during natural image viewing. In addition, we demonstrate the utility of smartphone-based gaze for detecting reading comprehension difficulty. Our results show the potential for scaling eye movement research by orders-of-magnitude to thousands of participants (with explicit consent), enabling advances in vision research, accessibility and healthcare. View details
    Google-Accelerated Biomolecular Simulations
    Biomolecular Simulations, Springer, New York (2019)
    Preview abstract Biomolecular simulations rely heavily on the availability of suitable compute infrastructure for data-driven tasks like modeling, sampling, and analysis. These resources are typically available on a per-lab and per-facility basis, or through dedicated national supercomputing centers. In recent years, cloud computing has emerged as an alternative by offering an abundance of on-demand, specialist-maintained resources that enable efficiency and increased turnaround through rapid scaling. Scientific computations that take the shape of parallel workloads using large datasets are commonplace, making them ideal candidates for distributed computing in the cloud. Recent developments have greatly simplified the task for the experimenter to configure the cloud for use and job submission. This chapter will show how to use Google’s Cloud Platform for biomolecular simulations by example of the molecular dynamics package GROningen MAchine for Chemical Simulations (GROMACS). The instructions readily transfer to a large variety of other tasks, allowing the reader to use the cloud for their specific purposes. Importantly, by using Docker containers (a popular light-weight virtualization solution) and cloud storage, key issues in scientific research are addressed: reproducibility of results, record keeping, and the possibility for other researchers to obtain copies and directly build upon previous work for further experimentation and hypothesis testing. View details
    Robotic grasp analysis using deformable solid mechanics
    S.J. Dharbaneshwer
    Shankar Subramanian
    Meccanica, Springer (2019)
    Preview abstract Given an object and a hand, identifying a robust grasp out of an infinite set of grasp candidates is a challenging problem, and several grasp synthesis approaches have been proposed in the robotics community to find the promising ones. Most of the approaches assume both the object and the hand to be rigid and evaluate the robustness of the grasp based on the wrenches acting at contact points. Since rigid body mechanics is used in these works, the actual distribution of the contact tractions is not considered, and contacts are represented by their resultant wrenches. However, the tractions acting at the contact interfaces play a critical role in the robustness of the grasp, and not accounting for these in detail is a serious limitation of the current approaches. In this paper, we replace the conventional wrench-based rigid-body approaches with a deformable-body mechanics formulation as is conventional in solid mechanics. We briefly review the wrench-based grasp synthesis approaches in the literature and address the drawbacks present in them from a solid mechanics standpoint. In our formulation, we account for deformation in both the grasper and the object and evaluate the robustness of grasp based on the distribution of normal and tangential tractions at the contact interface. We contrast how a given grasp situation is solved using conventional wrench space formulations and deformable solid mechanics and show how tractions on the contacting surfaces influence the grasp equilibrium. Recognizing that contact areas can be correlated to contact tractions, we propose a grasp performance index, π , based on the contact areas. We also devise a grasp analysis strategy to identify robust grasps under random perturbations and implement it using Finite Element Method (FEM) to study a few grasps. One of the key aspects of our Finite Element (FE)-based approach is that it can be used to monitor the dynamic interaction between object and hand for judging grasp robustness. We then compare our measure, π , with conventional grasp quality measures, ϵ and v and show that it successfully accounts for the effect of the physical characteristics of the object and hand (such as the mass, Young’s modulus and coefficient of friction) and identifies robust grasps that are in line with human intuition and experience. View details
    Preview abstract We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as input (and guarantees as output) scalars, vectors, and higher-order tensors, in the geometric sense of these terms. We demonstrate the capabilities of tensor field networks with tasks in geometry, physics, and chemistry. View details
    Predicting Brain Age Using Structural Neuroimaging and Deep Learning
    Atilla Peter Kiraly
    Diego Ardila
    Shravya Ramesh Shetty
    Sujeeth Bharadwaj
    (2018)
    Preview abstract Early detection of aging-related diseases requires a model of the underlying biological aging process. In this paper, we develop a brain-age predictor by using structural magnetic resonance imaging (SMRI) and deep learning and evaluate the predicted brain age as a marker of brain- aging. Our approach does not require any domain knowledge in that it uses a transfer-learning paradigm and has been validated on real SMRI data collected from elderly subjects. We developed two different predictive models by using convolutional neural network (CNN) based regression and bucket classification to predict brain ages from SMRI images. Our models achieved root mean squared errors (RMSE) of 5.54 and 6.44 (years) in predicting brain ages of cognitively normal subjects. Further analysis showed that there is a substantial difference between the predicted brain ages of cognitively impaired subjects and normal subjects within the same chronological age group. View details
    A Data-Driven Large-Scale Optimization Approach for Task-Specific Physics Realism in Real-Time Robotics Simulation
    Andreas Bihlmaier
    2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (2016)
    Preview abstract Physics-based simulation of robots requires mod- els of the simulated robots and their environment. For a realistic simulation behavior, these models must be accurate. Their physical properties such as geometric and kinematic values, as well as dynamic parameters such as mass, inertia matrix and friction, must be modelled. Unfortunately, this problem is hard for at least two reasons. First, physics engines designed for simulation of rigid bodies in real-time cannot accurately describe many common real world phenomena, e.g. (drive) friction and grasping. Second, classical parameter identification algorithms are well-studied and efficient, but often necessitate significant manual engineering effort and may not be applicable due to application constraints. Thus, we present a data- driven general purpose tool, which allows to optimize model parameters for (task-specific) realistic simulation behavior. Our approach directly uses the simulator and the model under optimization to improve model parameters. The optimization process is highly distributed and uses a hybrid optimization approach based on metaheuristics and the Ceres non-linear least squares solver. The user only has to provide a configuration file that specifies which model parameter to optimize together with realism criteri View details
    Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards
    Jeffrey Mahler
    Florian T. Pokorny
    Brian Hou
    Melrose Roderick
    Michael Laskey
    Mathieu Aubry
    Torsten Kroeger
    James Kuffner
    Ken Goldberg
    2016 IEEE International Conference on Robotics and Automation (ICRA) (2016)
    Preview abstract This paper presents the Dexterity Network (Dex-Net) 1.0, a dataset of 3D object models and a sampling-based planning algorithm to explore how Cloud Robotics can be used for robust grasp planning. The algorithm uses a Multi- Armed Bandit model with correlated rewards to leverage prior grasps and 3D object models in a growing dataset that currently includes over 10,000 unique 3D object models and 2.5 million parallel-jaw grasps. Each grasp includes an estimate of the probability of force closure under uncertainty in object and gripper pose and friction. Dex-Net 1.0 uses Multi-View Convolutional Neural Networks (MV-CNNs), a new deep learning method for 3D object classification, to provide a similarity metric between objects, and the Google Cloud Platform to simultaneously run up to 1,500 virtual cores, reducing experiment runtime by up to three orders of magnitude. Experiments suggest that correlated bandit techniques can use a cloud-based network of object models to significantly reduce the number of samples required for robust grasp planning. We report on system sensitivity to variations in similarity metrics and in uncertainty in pose and friction. Code and updated information is available at http://berkeleyautomation.github.io/dex-net/. View details
    Cloud-based simulations on Google Exacycle reveal ligand modulation of GPCR activation pathways
    Diwakar Shukla
    Morgan Lawrenz
    Gregory Bowman
    David Konerding
    Dan Belov
    Russ Altman
    Vijay Pande
    Nature Chemistry, 6 (2014), 15–21
    Preview abstract Simulations can provide tremendous insight into the atomistic details of biological mechanisms, but micro- to millisecond timescales are historically only accessible on dedicated supercomputers. We demonstrate that cloud computing is a viable alternative that brings long-timescale processes within reach of a broader community. We used Google's Exacycle cloud-computing platform to simulate two milliseconds of dynamics of a major drug target, the G-protein-coupled receptor β2AR. Markov state models aggregate independent simulations into a single statistical model that is validated by previous computational and experimental results. Moreover, our models provide an atomistic description of the activation of a G-protein-coupled receptor and reveal multiple activation pathways. Agonists and inverse agonists interact differentially with these pathways, with profound implications for drug design. View details
    The wisdom of clouds
    Chemistry World, 11 (2014), pp. 38
    Preview