Millions of Core-Hours Awarded to Science

December 17, 2012

Posted by Andrea Held, Program Manager, University Relations

In 2011 Google University Relations launched a new academic research awards program, Google Exacycle for Visiting Faculty, offering up to one billion core-hours to qualifying proposals. We were looking for projects that would consume 100M+ core-hours each and be of critical benefit to society. Not surprisingly, there was no shortage of applications.

Since then, the following seven scientists have been working on-site at Google offices in Mountain View and Seattle. They are here to run large computing experiments on Google’s infrastructure to change the future. Their projects include exploring antibiotic drug resistance, protein folding and structural modelling, drug discovery, and last but not least, the dynamic universe.

Today, we would like to introduce the Exacycle award recipients and their work. Please stay tuned for updates next year.

Simulating a Dynamic Universe with the Large Synoptic Sky Survey
Jeff Gardner, University of Washington, Seattle, WA
Collaborators: Andrew Connolly, University of Washington, Seattle, WA, and John Peterson, Purdue University, West Lafayette, IN

Research subject: The Large Synoptic Survey Telescope (LSST) is one of the most ambitious astrophysical research programs ever undertaken. Starting in 2019, the LSST’s 3.2 Gigapixel camera will repeatedly survey the southern sky, generating tens of petabytes of data every year. The images and catalogs from the LSST have the potential to transform both our understanding of the universe and the way that we engage in science in general.
Exacycle impact: In order to design the telescope to yield the best possible science, the LSST collaboration has undertaken a formidable computational campaign to simulate the telescope itself. This will optimize how the LSST surveys the sky and provide realistic datasets for the development of analysis pipelines that can operate on hundreds of petabytes. Using Exacycle, we are reducing the time required to simulate one night of LSST observing, roughly 5 million images, from 3 months down to a few days. This rapid turnaround will enable the LSST engineering teams to test new designs and new algorithms with unprecedented precision, which will ultimately lead to bigger and better science from the LSST.

Designing and Defeating Antibiotic Drug Resistance
Peter Kasson, Assistant Professor, Departments of Molecular Physiology and Biological Physics and of Biomedical Engineering, University of Virginia

Research subject: Antibiotics have made most bacterial infections routinely treatable. As antibiotic use has become common, bacterial resistance to these drugs has also increased. Recently, some bacteria have arisen that are resistant to almost all antibiotics. We are studying the basis for this resistance, in particular the enzyme that acts to break down many antibiotics. Identifying the critical changes required for pan-resistance will aid surveillance and prevention; it will also help elucidate targets for the development of new therapeutic agents.
Exacycle impact: Exacycle allows us to simulate the structure and dynamics of several thousand enzyme variants in great detail. The structural differences between enzymes from resistant and non-resistant bacteria are subtle, so we have developed methods to compare structural "fingerprints" of the enzymes and identify distinguishing characteristics. The complexity of this calculation and large number of potential bacterial sequences mean that this is a computationally intensive task; the massive computing power offered by Exacycle in combination with some novel sampling strategies make this calculation tractable.

Sampling the conformational space of G protein-coupled receptors
Kai Kohlhoff, Research Scientist at Google
Collaborators: Research labs of Vijay Pande and Russ Altman at Stanford University

Research subject: G protein-coupled receptors (GPCRs) are proteins that act as signal transducers in the cell membrane and influence the response of a cell to a variety of external stimuli. GPCRs play a role in many human diseases, such as asthma and hypertension, and are well established as a primary drug target.
Exacycle impact: Exacycle let us perform many tens of thousands of molecular simulations of membrane-bound GPCRs in parallel using the Gromacs software. With MapReduce, Dremel, and other technologies, we analyzed the 100s of Terabytes of generated data and built Markov State Models. The information contained in these models can help scientists design drugs that have higher potency and specificity than those presently available.
Results: Our models let us explore kinetically meaningful receptor states and transition rates, which improved our understanding of the structural changes that take place during activation of a signaling receptor. In addition, we used Exacycle to study the affinity of drug molecules when binding to different receptor states.

Modeling transport through the nuclear pore complex
Daniel Russel, post doc in structural biology, University of California, San Francisco

Research subject: Our goal is to develop a predictive model of transport through the nuclear pore complex (NPC). Developing the model requires understanding how the behavior of the NPC varies as we change the parameters governing the components of the system. Such a model will allow us to understand how transportins, the unstructured domains and the rest of the cellular milieu, interact to determine efficiency and specificity of macromolecular transport into and out of the nucleus.
Exacycle impact: Since data describing the microscopic behavior of most parts of the nuclear transport process is incomplete and contradictory, we have to explore a larger parameter space than would be feasible with traditional computational resources.
Status: We are currently modeling various experimental measurements of aspects of the nuclear transport process. These experiments range from simple ones containing only a few components of the transport process to measurements on the whole nuclear pore with transportins and cellular milieu.

Large scale screening for new drug leads that modulate the activity of disease-relevant proteins
James Swetnam, Scientific Software Engineer,, NYU School of Medicine
Collaborators: Tim Cardozo, MD, PhD - NYU School of Medicine.

Research subject: We are using a high throughput, CPU-bound procedure known as virtual ligand screening to ‘dock’, or produce rough estimates of binding energy, for a large sample of bioactive chemical space to the entirety of known protein structures. Our goal is the first computational picture of how bioactive chemistry with therapeutic potential can affect human and pathogen biology.
Exacycle Impact: Typically, using our academic lab’s resources, we could screen a few tens of thousands of compounds against a single protein to try to find modulators of its function. To date, Exacycle has enabled us to screen 545,130 compounds against 8,535 protein structures that are involved in important and underserved diseases as cancer, diabetes, malaria, and HIV to look for new leads towards future drugs.
Status: We are currently expanding our screens to an additional 206,190 models from
ModBase. We aim to have a public dataset for the research community in the first half of 2013.

Protein Structure Prediction and Design
Michael Tyka, Research Fellow, University of Washington, Seattle, WA

Research subject: The precise relationship between the primary sequence and the three dimensional structure of proteins is one of the unsolved grand challenges of computational biochemistry. The Baker Lab has made significant progress in recent years by developing more powerful protein prediction and design algorithms using the Rosetta Protein Modelling suite.
Exacycle impact: Limitations in the accuracy of the physical model and lack of sufficient computational power have prevented solutions to broader classes of medically relevant problems. Exacycle allows us to improve model quality by conducting large parameter optimization sweeps with a very large dataset of experimental protein structural data. The improved energy functions will benefit the entire theoretical protein research community.

We are also using Exacycle to conduct simultaneous docking and one-sided protein design to develop novel protein binders for a number of medically relevant targets. For the first time, we are able to aggressively redesign backbone conformations at the binding site. This allows for a much greater flexibility in possible binding shapes but also hugely increases the space of possibilities that have to be sampled. Very promising designs have already been found using this method.