Peptide-Spectra Matching with Weak Supervision

Sam Schoenholz; Sean Hackett; Laura Deming; Eugene Melamud; Andrew Dai; Navdeep Jaitly; Fiona McAllister; Jonathon O'Brien; George Dahl; Bryson Bennett; Daphne Koller

Peptide-Spectra Matching with Weak Supervision

Sam Schoenholz

Sean Hackett

Laura Deming

Eugene Melamud

Andrew Dai

Navdeep Jaitly

Fiona McAllister

Jonathon O'Brien

George Dahl

Bryson Bennett

Daphne Koller

arXiv (2018)

Download Google Scholar

Abstract

As in many other scientific domains, we face a fundamental problem when using
machine learning to identify proteins from mass spectrometry data: large ground
truth datasets mapping inputs to correct outputs are extremely difficult to obtain.
Instead, we have access to imperfect hand-coded models crafted by domain experts.
In this paper, we apply deep neural networks to an important step of the protein
identification problem, the pairing of mass spectra with short sequences of amino
acids called peptides. We train our model to differentiate between top scoring
results from a state-of-the art classical system and hard-negative second and third
place results. Our resulting model is much better at identifying peptides with
spectra than the model used to generate its training data. In particular, we achieve
a 43% improvement over standard matching methods and a 10% improvement
over a combination of the matching method and an industry standard cross-spectra
reranking tool. Importantly, in a more difficult experimental regime that reflects
current challenges facing biologists, our advantage over the previous state-of-theart
grows to 15% even after reranking. We believe this approach will generalize to
other challenging scientific problems.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Peptide-Spectra Matching with Weak Supervision

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs