Prediction errors of molecular machine learning models lower than hybrid DFT error

Felix Faber; Luke Hutchinson; Huang Bing; Justin Gilmer; Sam Schoenholz; George Dahl; Oriol Vinyals; Steven Kearnes; Patrick Riley; Anatole von Lilienfeld

Prediction errors of molecular machine learning models lower than hybrid DFT error

Felix Faber

Luke Hutchinson

Huang Bing

Justin Gilmer

Sam Schoenholz

George Dahl

Oriol Vinyals

Steven Kearnes

Patrick Riley

Anatole von Lilienfeld

Journal of Chemical Theory and Computation (2017)

Download Google Scholar

Abstract

We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of thirteen electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed with learning curves which report approximation errors as a function of training set size. Molecular structures and properties at hybrid density functional theory (DFT) level of theory used for training and testing come from the QM9 database [Ramakrishnan et al, Scientific Data 1 140022 (2014)] and include dipole moment, polarizability, HOMO/LUMO energies and gap, electronic spatial extent, zero point vibrational energy, enthalpies and free energies of atomization, heat capacity and the highest fundamental vibrational frequency. Various representations from the literature have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), and angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR) and two types of neural networks, graph convolutions (GC) and gated graph networks (GG). We present numerical evidence that ML model predictions for all properties can reach an approximation error to DFT which is on par with chemical accuracy. These findings indicate that ML models could be more accurate than DFT if explicitly electron correlated quantum (or experimental) data was provided.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Prediction errors of molecular machine learning models lower than hybrid DFT error

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs