![Klaus-Robert Müller](https://storage.googleapis.com/gweb-research2023-media/pubtools/5965.png)
Klaus-Robert Müller
Klaus-Robert Müller has been a professor of computer science at Technische Universität Berlin since 2006; at the same time he is directing rsp. co-directing the Berlin Machine Learning Center and the Berlin Big Data Center and most recently BIFOLD . He studied physics in Karlsruhe from1984 to 1989 and obtained his Ph.D. degree in computer science at Technische Universität Karlsruhe in 1992. After completing a postdoctoral position at GMD FIRST in Berlin, he was a research fellow at the University of Tokyo from 1994 to 1995. In 1995, he founded the Intelligent Data Analysis group at GMD-FIRST (later Fraunhofer FIRST) and directed it until 2008. From 1999 to 2006, he was a professor at the University of Potsdam. From 2012 he has been Distinguished Professor at Korea University in Seoul. In 2020/2021 he spent his sabbatical at Google Brain as a Principal Scientist. Among others, he was awarded the Olympus Prize for Pattern Recognition (1999), the SEL Alcatel Communication Award (2006), the Science Prize of Berlin by the Governing Mayor of Berlin (2014), the Vodafone Innovations Award (2017), Hector Science Award (2024), Pattern Recognition Best Paper award (2020), Digital Signal Processing Best Paper award (2022). In 2012, he was elected member of the German National Academy of Sciences-Leopoldina, in 2017 of the Berlin Brandenburg Academy of Sciences, in 2021 of the German National Academy of Science and Engineering and also in 2017 external scientific member of the Max Planck Society. From 2019 on he became an ISI Highly Cited researcher in the cross-disciplinary area. His research interests are intelligent data analysis and Machine Learning in the sciences (Neuroscience (specifically Brain-Computer Interfaces, Physics, Chemistry) and in industry.
Authored Publications
Google Publications
Other Publications
Sort By
Accurate global machine learning force fields for molecules with hundreds of atoms
Stefan Chmiela
Valentin Vassilev Galindo
Adil Kabylda
Huziel E. Sauceda
Alexandre Tkatchenko
Science Advances, 9(2)(2023), eadf0873
Preview abstract
Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset.
View details
Canonical Response Parameterization: Quantifying the structure of responses to single-pulse intracranial electrical brain stimulation
Kai J. Miller
Gabriela Ojeda Valencia
Harvey Huang
Nicholas M. Gregg
Gregory A. Worrell
Dora Hermes
Plos Computational Biology, 19(5)(2023), e1011105
Preview abstract
Single-pulse electrical stimulation in the nervous system, often called cortico-cortical evoked potential (CCEP) measurement, is an important technique to understand how brain regions interact with one another. Voltages are measured from implanted electrodes in one brain area while stimulating another with brief current impulses separated by several seconds. Historically, researchers have tried to understand the significance of evoked voltage polyphasic deflections by visual inspection, but no general-purpose tool has emerged to understand their shapes or describe them mathematically. We describe and illustrate a new technique to parameterize brain stimulation data, where voltage response traces are projected into one another using a semi-normalized dot product. The length of timepoints from stimulation included in the dot product is varied to obtain a temporal profile of structural significance, and the peak of the profile uniquely identifies the duration of the response. Using linear kernel PCA, a canonical response shape is obtained over this duration, and then single-trial traces are parameterized as a projection of this canonical shape with a residual term. Such parameterization allows for dissimilar trace shapes from different brain areas to be directly compared by quantifying cross-projection magnitudes, response duration, canonical shape projection amplitudes, signal-to-noise ratios, explained variance, and statistical significance. Artifactual trials are automatically identified by outliers in sub-distributions of cross-projection magnitude, and rejected. This technique, which we call “Canonical Response Parameterization” (CRP) dramatically simplifies the study of CCEP shapes, and may also be applied in a wide range of other settings involving event-triggered data.
View details
BIGDML—Towards accurate quantum machine learning force fields for materials
Huziel Sauceda
Luis Gálvez-González
Stefan Chmiela
Lauro Oliver Paz Borbon
Alexandre Tkatchenko
Nature Communications, 13(2022), pp. 3733
Preview abstract
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof. Currently, MLFFs often introduce tradeoffs that restrict their practical applicability to small subsets of chemical space or require exhaustive datasets for training. Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning (BIGDML) approach and demonstrate its ability to construct reliable force fields using a training set with just 10–200 geometries for materials including pristine and defect-containing 2D and 3D semiconductors and metals, as well as chemisorbed and physisorbed atomic and molecular adsorbates on surfaces. The BIGDML model employs the full relevant symmetry group for a given material, does not assume artificial atom types or localization of atomic interactions and exhibits high data efficiency and state-of-the-art energy accuracies (errors substantially below 1 meV per atom) for an extended set of materials. Extensive path-integral molecular dynamics carried out with BIGDML models demonstrate the counterintuitive localization of benzene–graphene dynamics induced by nuclear quantum effects and their strong contributions to the hydrogen diffusion coefficient in a Pd crystal for a wide range of temperatures.
View details
Towards Robust Explanations for Deep Neural Networks
Ann-Kathrin Dombrowski
Christopher Johannes Anders
Pan Kessel
Pattern Recognition, 121(2022), pp. 108194
Preview abstract
Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches.
View details
So3krates - Self-attention for higher-order geometric interactions on arbitrary length-scales
Thorben Frank
Advances in Neural Information Processing Systems(2022) (to appear)
Preview abstract
The application of machine learning (ML) methods in quantum chemistry has enabled the study of numerous chemical phenomena, which are computationally intractable with traditional ab initio methods. However, some quantum mechanical properties of molecules and materials depend on non-local electronic effects, which are often neglected due to the difficulty of modelling them efficiently. This work proposes a modified attention mechanism adapted to the underlying physics, which allows to recover the relevant non-local effects. Namely, we introduce spherical harmonic coordinates (SPHCs) to reflect higher order geometric information for each atom in a molecule, enabling a non-local formulation of attention in the SPHC space. Our proposed model So3krates -- a self-attention based message passing neural network (MPNN) -- uncouples geometric information from atomic features, making them independently amenable to attention mechanisms. We show that in contrast to other published methods, So3krates is able to describe quantum mechanical effects due to orbital overlap over arbitrary length scales. Further, So3krates is shown to match or exceed state-of-the-art performance on the popular MD-17 and QM-7X benchmarks, notably, requiring a significantly lower number of parameters while at the same time giving a substantial speedup compared to other models.
View details
Algorithmic Differentiation for Automatized Modelling of Machine Learned Force Fields
Niklas Schmitz
Stefan Chmiela
The Journal of Physical Chemistry Letters, 13(43)(2022), pp. 10183-10189
Preview abstract
Reconstructing force fields (FFs) from atomistic simulation data is a challenge since accurate data can be highly expensive. Here, machine learning (ML) models can help to be data economic as they can be successfully constrained using the underlying symmetry and conservation laws of physics. However, so far, every descriptor newly proposed for an ML model has required a cumbersome and mathematically tedious remodeling. We therefore propose using modern techniques from algorithmic differentiation within the ML modeling process, effectively enabling the usage of novel descriptors or models fully automatically at an order of magnitude higher computational efficiency. This paradigmatic approach enables not only a versatile usage of novel representations and the efficient computation of larger systems─all of high value to the FF community─but also the simple inclusion of further physical knowledge, such as higher-order information (e.g., Hessians, more complex partial differential equations constraints etc.), even beyond the presented FF domain.
View details
Toward Explainable Artificial Intelligence for Regression Models: A methodological perspective
Simon Letzgus
Jonas Lederer
Wojciech Samek
Gregoire Montavon
IEEE Signal Processing Magazine, 39 (4)(2022), 40–58
Preview abstract
In addition to the impressive predictive power of machine learning (ML) models, more recently, explanation methods have emerged that enable an interpretation of complex nonlinear learning models, such as deep neural networks. Gaining a better understanding is especially important, e.g., for safety-critical ML applications or medical diagnostics and so on. Although such explainable artificial intelligence (XAI) techniques have reached significant popularity for classifiers, thus far, little attention has been devoted to XAI for regression models (XAIR). In this review, we clarify the fundamental conceptual differences of XAI for regression and classification tasks, establish novel theoretical insights and analysis for XAIR, provide demonstrations of XAIR on genuine practical regression problems, and finally, discuss challenges remaining for the field.
View details
Harmoni: a Method for Eliminating Spurious Interactions due to the Harmonic Components in Neuronal Data
Mina Jamshidi Idaji
Juanli Zhang
Tilman Stephani
Guido Nolte
Arno Villringer
Vadim Nikulin
Neuroimage, 252(2022), pp. 119053
Preview abstract
Cross-frequency synchronization (CFS) has been proposed as a mechanism for integrating spatially and spectrally distributed information in the brain. However, investigating CFS in Magneto- and Electroencephalography (MEG/EEG) is hampered by the presence of spurious neuronal interactions due to the non-sinusoidal waveshape of brain oscillations. Such waveshape gives rise to the presence of oscillatory harmonics mimicking genuine neuronal oscillations. Until recently, however, there has been no methodology for removing these harmonics from neuronal data. In order to address this long-standing challenge, we introduce a novel method (called HARMOnic miNImization - Harmoni) that removes the signal components which can be harmonics of a non-sinusoidal signal. Harmoni’s working principle is based on the presence of CFS between harmonic components and the fundamental component of a non-sinusoidal signal. We extensively tested Harmoni in realistic EEG simulations. The simulated couplings between the source signals represented genuine and spurious CFS and within-frequency phase synchronization. Using diverse evaluation criteria, including ROC analyses, we showed that the within- and cross-frequency spurious interactions are suppressed significantly, while the genuine activities are not affected. Additionally, we applied Harmoni to real resting-state EEG data revealing intricate remote connectivity patterns which are usually masked by the spurious connections. Given the ubiquity of non-sinusoidal neuronal oscillations in electrophysiological recordings, Harmoni is expected to facilitate novel insights into genuine neuronal interactions in various research fields, and can also serve as a steppingstone towards the development of further signal processing methods aiming at refining within- and cross-frequency synchronization in electrophysiological recordings.
View details
Artificial Intelligence and Pathology: from Principles to Practice and Future Applications in Histomorphology and Molecular Profiling
Albrecht Stenzinger
Max Alber
Michael Allgäuer
Phillip Jurmeister
Michael Bockmayr
Jan Budczies
Jochen Lennerz
Johannes Eschrich
Daniel Kazdal
Peter Schirmacher
Alex H Wagner
Frank Tacke
David Capper
Frederick Klauschen
Seminars in Cancer Biology, 84(2022), pp. 129-143
Preview abstract
The complexity of diagnostic (surgical) pathology has increased substantially over the last
decades with respect to histomorphological and molecular profiling and has steadily expanded
its role in tumor diagnostics and beyond from disease entity identification via prognosis
estimation to precision therapy prediction. It is therefore not surprising that pathology is among
the disciplines in medicine with high expectations in the application of artificial intelligence (AI)
or machine learning approaches given its capabilities to analyse complex data in a quantitative
and standardised manner to further enhance scope and precision of diagnostics. While an
obvious application is the analysis of histological images, recent applications for the analysis
of molecular profiling data from different sources and clinical data support the notion that AI
will support both histopathology and molecular pathology in the future. At the same time,
current literature should not be misunderstood in a way that pathologists will likely be replaced
by AI applications in the foreseeable future. Although AI will likely transform pathology in the
coming years, recent studies reporting AI algorithms to diagnose cancer or predict certain
molecular properties deal with relatively simple diagnostic problems that fall short of the
diagnostic complexity pathologists face in clinical routine. Here, we review the pertinent
literature of AI methods and their applications to pathology, and put the current achievements
and what can be expected in the future in the context of the requirements for research and
routine diagnostics.
View details
Efficient Computation of Higher-Order Subgraph Attribution via Message Passing
Ping Xiong
Thomas Schnake
Gregoire Montavon
Shin Nakajima
ICML(2022) (to appear)
Preview abstract
Explaining graph neural networks (GNNs) has become more and more important recently. Higherorder interpretation schemes, such as GNNLRP (layer-wise relevance propagation for GNN),
emerged as powerful tools for unraveling how
different features interact thereby contributing to
explaining GNNs. Methods such as GNN-LRP
perform walks between nodes at each layer, and
there are exponentially many such walks. In this
work, we demonstrate that such exponential complexity can be avoided, in particular, we propose
novel linear-time (w.r.t. depth) algorithms that
enable to efficiently perform GNN-LRP for subgraphs. Our algorithms are derived via message
passing techniques that make use of the distributive property, thereby directly computing quantities for higher-order explanations. We further
adapt our efficient algorithms to compute a generalization of subgraph attributions that also takes
into account the neighboring graph features. Experimental results show significant acceleration of
the proposed algorithms and demonstrate a high
usefulness and scalability of our novel generalized
subgraph attribution.
View details
Higher-Order Explanations of Graph Neural Networks via Relevant Walks
Thomas Schnake
Oliver Eberle
Jonas Lederer
Shin Nakajima
Kristof T. Schütt
Gregoire Montavon
IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11)(2022), pp. 7581 - 7596
Preview abstract
Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e., by identifying groups of edges that jointly contribute to the prediction. Practically, we find that such explanations can be extracted using a nested attribution scheme, where existing techniques such as layer-wise relevance propagation (LRP) can be applied at each step. The output is a collection of walks into the input graph that are relevant for the prediction. Our novel explanation method, which we denote by GNN-LRP, is applicable to a broad range of graph neural networks and lets us extract practically relevant insights on sentiment analysis of text data, structure-property relationships in quantum chemistry, and image classification.
View details
Super-resolution in Molecular Dynamics Trajectory Reconstruction with Bi-Directional Neural Networks
Paul Ludwig Winkler
Huziel Saucceda
Machine Learning: Science and Technology, 3(2022), pp. 025011
Preview abstract
Molecular dynamics (MD) simulations are a cornerstone in science, enabling the investigation of a
system’s thermodynamics all the way to analyzing intricate molecular interactions. In general,
creating extended molecular trajectories can be a computationally expensive process, for example,
when running ab-initio simulations. Hence, repeating such calculations to either obtain more
accurate thermodynamics or to get a higher resolution in the dynamics generated by a fine-grained
quantum interaction can be time- and computational resource-consuming. In this work, we
explore different machine learning methodologies to increase the resolution of MD trajectories
on-demand within a post-processing step. As a proof of concept, we analyse the performance of
bi-directional neural networks (NNs) such as neural ODEs, Hamiltonian networks, recurrent NNs
and long short-term memories, as well as the uni-directional variants as a reference, for MD
simulations (here: the MD17 dataset). We have found that Bi-LSTMs are the best performing
models; by utilizing the local time-symmetry of thermostated trajectories they can even learn
long-range correlations and display high robustness to noisy dynamics across molecular
complexity. Our models can reach accuracies of up to 10−4 Å in trajectory interpolation, which
leads to the faithful reconstruction of several unseen high-frequency molecular vibration cycles.
This renders the comparison between the learned and reference trajectories indistinguishable. The
results reported in this work can serve (1) as a baseline for larger systems, as well as (2) for the
construction of better MD integrators.
View details
Basis profile curve identification to understand electrical stimulation effects in human brain networks
Kai Joshua Miller
Dora Hermes
Plos Computational Biology, 17(9)(2021), e1008710, https://doi.org/10.1371/journal.pcbi.1008710
Preview abstract
Brain networks can be explored by delivering brief pulses of electrical current in one area while measuring voltage responses in other areas. We propose a convergent paradigm to study brain dynamics, focusing on a single brain site to observe the average effect of stimulating each of many other brain sites. Viewed in this manner, visually-apparent motifs in the temporal response shape emerge from adjacent stimulation sites. This work constructs and illustrates a data-driven approach to determine characteristic spatiotemporal structure in these response shapes, summarized by a set of unique “basis profile curves” (BPCs). Each BPC may be mapped back to underlying anatomy in a natural way, quantifying projection strength from each stimulation site using simple metrics. Our technique is demonstrated for an array of implanted brain surface electrodes in a human patient. This framework enables straightforward interpretation of single-pulse brain stimulation data, and can be applied generically to explore the diverse milieu of interactions that comprise the connectome.
View details
Explainable Deep One-Class Classification
Philipp Liznerski
Lukas Ruff
Robert Vandermeulen
Billy Joe Franks
Marius Kloft
ICLR 2021(2021) (to appear)
Preview abstract
Deep one-class classification variants for anomaly detection learn a mapping that
concentrates nominal samples in feature space causing anomalies to be mapped
away. Because this transformation is highly non-linear, finding interpretations poses
a significant challenge. In this paper we present an explainable deep one-class
classification method, Fully Convolutional Data Description (FCDD), where the
mapped samples are themselves also an explanation heatmap. FCDD yields competitive detection performance and provides reasonable explanations on common
anomaly detection benchmarks with CIFAR-10 and ImageNet. On MVTec-AD, a
recent manufacturing dataset offering ground-truth anomaly maps, FCDD sets a
new state of the art in the unsupervised setting. Our method can incorporate groundtruth anomaly maps during training and using even a few of these (∼ 5) improves
performance significantly. Finally, using FCDD’s explanations we demonstrate the
vulnerability of deep one-class classification models to spurious image features
such as image watermarks
View details
Dynamical Strengthening of Covalent and Non-Covalent Molecular Interactions by Nuclear Quantum Effects at Finite Temperature
Huziel Saucceda
Stefan Chmiela
Valentin Vassilev Galindo
Alexandre Tkatchenko
Nature Communications, 12(2021), pp. 442
Preview abstract
Nuclear quantum effects (NQE) tend to generate delocalized molecular dynamics due to the anharmonicity of interatomic interactions. Here, we present evidence that NQE often enhance electronic
interactions and, in turn, can result in dynamical molecular stabilization at finite temperature. The
underlying physical mechanism promoted by NQE depends on the particular interaction under consideration. First, the effective reduction of interatomic distances between functional groups within
a molecule enhances the n → π
∗
interaction by increasing the overlap between molecular orbitals or
by strengthening electrostatic interactions between neighboring charge densities. Second, NQE can
localize methyl rotors by temporarily changing molecular bond orders and leading to the emergence
of localized transient rotor states. Third, for noncovalent interactions the strengthening comes from
the increase of the polarizability given the expanded average interatomic distances induced by NQE.
The implications of these boosted interactions include counterintuitive hydroxyl–hydroxyl bonding,
hindered methyl rotor dynamics, and molecular stiffening which generates smoother free-energy surfaces. These results challenge the general assumption that NQE tend to mainly generate delocalized
dynamics and reveal that NQE also play an active role in dynamical strengthening of molecular
interactions. Our findings yield new insights into the versatile role of nuclear quantum fluctuations
in molecules and materials
View details
Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems
John A Keith
Valentin Vassilev Galindo
Bingqing Cheng
Stefan Chmiela
Michael Gastegger
Alexandre Tkatchenko
Chemical Reviews, 121 (16)(2021), 9816-9872, https://pubs.acs.org/doi/pdf/10.1021/acs.chemrev.1c00107
Preview abstract
Machine learning models are poised to make transformative impact in the chemical
sciences by dramatically accelerating computational algorithms and amplifying insights
available from computational chemistry methods. However, achieving this requires a
confluence and coaction of expertise in computer science and physical sciences. This
review is written for new and experienced researchers working at the intersection of
both fields. We first provide concise tutorials of computational chemistry, machine
learning methods, and how insights involving both can be achieved. We then follow
with a critical review of noteworthy applications that demonstrate how computational
quantum chemistry and machine learning can be used together to provide insightful
(and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis,
and drug design.
View details
Machine Learning Force Fields
Oliver Unke
Stefan Chmiela
Huziel Saucceda
Michael Gastegger
Igor Poltavsky
Kristof T. Schütt
Alexandre Tkatchenko
Chemical Reviews, 121 (16)(2021), 10142-10186, https://pubs.acs.org/doi/pdf/10.1021/acs.chemrev.0c01111
Preview abstract
In recent years, the use of Machine Learning
(ML) in computational chemistry has enabled
numerous advances previously out of reach due
to the computational complexity of traditional
electronic-structure methods. One of the most
promising applications is the construction of
ML-based force fields (FFs), with the aim to
narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs.
The key idea is to learn the statistical relation
between chemical structure and potential energy
without relying on a preconceived notion of fixed
chemical bonds or knowledge about the relevant
interactions. Such universal ML approximations
are in principle only limited by the quality and
quantity of the reference data used to train them.
This review gives an overview of applications
of ML-FFs and the chemical insights that can
be obtained from them. The core concepts underlying ML-FFs are described in detail and a
step-by-step guide for constructing and testing
them from scratch is given. The text concludes
with a discussion of the challenges that remain
to be overcome by the next generation of MLFFs.
View details
A Unifying Review of Deep and Shallow Anomaly Detection
Lukas Ruff
Jacob Reinhard Kauffmann
Robert Vandermeulen
Gregoire Montavon
Wojciech Samek
Marius Kloft
Thomas G. Dietterich
Proc of the IEEE, 109(5)(2021), pp. 756-795 (to appear)
Preview abstract
Deep learning approaches to anomaly detection have
recently improved the state of the art in detection performance
on complex datasets such as large collections of images or text.
These results have sparked a renewed interest in the anomaly
detection problem and led to the introduction of a great variety
of new methods. With the emergence of numerous such methods
that include approaches based on generative models, one-class
classification, and reconstruction, there is a growing need to bring
methods of this field into a systematic and unified perspective. In
this review, we therefore aim to identify the common underlying
principles as well as the assumptions that are often made
implicitly by various methods. In particular, we draw connections
between classic ‘shallow’ and novel deep approaches and show
how they exactly relate and moreover how this relation might
cross-fertilize or extend both directions. We further provide an
empirical assessment of major existing methods that is enriched
by the use of recent explainability techniques, and present specific
worked-through examples together with practical advice. Finally,
we outline critical open challenges and identify specific paths for
future research in anomaly detection.
View details
SE(3)-equivariant prediction of molecular wavefunctions and electronic densities
Mihail Bogojeski
Michael Gastegger
Mario Geiger
Tess Smidt
Advances in Neural Information Processing Systems(2021)
Preview abstract
Machine learning has enabled the prediction of quantum chemical properties with high accuracy and efficiency, allowing to bypass computationally costly ab initio calculations. Instead of training on a fixed set of properties, more recent approaches attempt to learn the electronic wavefunction (or density) as a central quantity of atomistic systems, from which all other observables can be derived. This is complicated by the fact that wavefunctions transform non-trivially under molecular rotations, which makes them a challenging prediction target. To solve this issue, we introduce general SE(3)-equivariant operations and building blocks for constructing deep learning architectures for geometric point cloud data and apply them to reconstruct wavefunctions of atomistic systems with unprecedented accuracy. Our model reduces prediction errors by up to two orders of magnitude compared to the previous state-of-the-art and makes it possible to derive properties such as energies and forces directly from the wavefunction in an end-to-end manner. We demonstrate the potential of our approach in a transfer learning application, where a model trained on low accuracy reference wavefunctions implicitly learns to correct for electronic many-body interactions from observables computed at a higher level of theory. Such machine-learned wavefunction surrogates pave the way towards novel semi-empirical methods, offering resolution at an electronic level while drastically decreasing computational cost. While we focus on physics applications in this contribution, the proposed equivariant framework for deep learning on point clouds is promising also beyond, say, in computer vision or graphics.
View details
SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and Nonlocal Effects
Stefan Chmiela
Michael Gastegger
Kristof T. Schütt
Huziel Saucceda
Nature Communications, 12(2021), pp. 7273
Preview abstract
Machine-learned force fields combine the accuracy of ab initio methods with the efficiency of conventional force fields. However, current machine-learned force fields typically ignore electronic degrees of freedom, such as the total charge or spin state, and assume chemical locality, which is problematic when molecules have inconsistent electronic states, or when nonlocal effects play a significant role. This work introduces SpookyNet, a deep neural network for constructing machine-learned force fields with explicit treatment of electronic degrees of freedom and nonlocality, modeled via self-attention in a transformer architecture. Chemically meaningful inductive biases and analytical corrections built into the network architecture allow it to properly model physical limits. SpookyNet improves upon the current state-of-the-art (or achieves similar performance) on popular quantum chemistry data sets. Notably, it is able to generalize across chemical and conformational space and can leverage the learned chemical insights, e.g. by predicting unknown spin states, thus helping to close a further important remaining gap for today’s machine learning models in quantum chemistry.
View details
Sensorimotor functional connectivity: a neurophysiological factor related to BCI performance
Carmen Vidaurre
Stefan Haufe
Tania Jorajuría Gómez
Vadim Nikulin
Frontiers in Neuroscience, 14(2020), pp. 575081
Preview abstract
Brain-Computer Interfaces (BCIs) are systems that allow users to control devices using brain activity alone. However, the ability of participants to command BCIs varies from subject to subject. For BCIs based on the modulation of sensorimotor rhythms
as measured by means of electroencephalography (EEG), about 20\% of potential users do not obtain enough accuracy to gain reliable control of the system. This lack of efficiency of BCI systems to decode user's intentions necessitates identification of neurophysiological factors determining `good' and `poor' BCI performers. Given that the neuronal oscillations, used in BCI, demonstrate rich a repertoire of spatial interactions, we hypothesized that neuronal activity in sensorimotor areas would define some aspects of BCI performance. Analyses for this study were performed on a large dataset of 80 inexperienced participants. They took part in calibration and an online feedback session in the same day. Undirected functional connectivity was computed over sensorimotor areas by means of the imaginary part of coherency. The results show that post- as well as pre-stimulus connectivity in the calibration recordings is significantly correlated to online feedback performance in $\mu$ and feedback frequency bands. Importantly,
the significance of the correlation between connectivity and BCI feedback accuracy was not due to the signal-to-noise ratio of the oscillations in the corresponding post and pre-stimulus intervals. Thus, this study shows that BCI performance is not only dependent on the amplitude of sensorimotor oscillations as shown previously, but that it also relates to sensorimotor connectivity measured during the preceding training session. The presence of such connectivity between motor and somatosensory systems is likely to facilitate motor imagery, which in turn is associated with the generation of a more pronounced modulation of sensorimotor oscillations (manifested in ERD/ERS) required for the adequate BCI performance. We also discuss strategies for the up-regulation of such connectivity in order to enhance BCI performance.
View details
Molecular force fields with gradient-domain machine learning (GDML): Comparison and synergies with classical force fields
Huziel Saucceda
Michael Gastegger
Stefan Chmiela
Alexandre Tkatchenko
Journal of Chemical Physics, 153(2020), pp. 124109
Preview abstract
The goal of the present work is to perform a detailed investigation of the differences between both systems based
on a set of small molecules exhibiting different quantum
mechanical phenomena. Based on these results, different
alternatives are explored for improving the data generation process and their applicability context for expediting the force-field learning procedure. Furthermore,
improvement of the accuracy for MM-FFs is studied by
reparameterising them based on more accurate reference
data and test their limits and functional form flexibility. For this task, we use the recently published sGDML
framework[25, 26] as ML-FF of choice, as it is able to efficiently reconstruct the potential energy surfaces (PES)
of medium sized molecules. The investigated systems are
the molecules ethanol, the keto and enol forms of malondialdehyde (keto-MDA and enol-MDA, respectively) as
well as salicylic and acetylsalicylic acid (Aspirin). In the
context of these systems, we study the performance of
MM-FFs and sGDML derived FFs based on the overall
reliability of the generated PESs, as well as effects arising from chemical phenomena such as hydrogen transfer
and orbital interactions. Although we restrict ourselves
to the sGDML approach, it can nevertheless be expected
that the results found here are equally valid for ML-FFs
in general.
View details
Novel multivariate methods to track frequency shifts of neural oscillations in EEG/MEG recordings
Carmen Vidaurre
Kshipra Gurunandand
Mina Jamshidi Idaji
Guido Nolte
Marisol Gómez
Arno Villringer
Vadim Nikulin
Neuroimage, 276(2023), pp. 120178
Preview abstract
Instantaneous and peak frequency changes in neural oscillations have been linked to many perceptual, motor, and cognitive processes. Yet, the majority of such studies have been performed in sensor space and only occasionally in source space. Furthermore, both terms have been used interchangeably in the literature, although they do not reflect the same aspect of neural oscillations. In this paper, we discuss the relation between instantaneous frequency, peak frequency, and local frequency, the latter also known as spectral centroid. Furthermore, we propose and validate three different methods to extract source signals from multichannel data whose (instantaneous, local, or peak) frequency estimate is maximally correlated to an experimental variable of interest. Results show that the local frequency might be a better estimate of frequency variability than instantaneous frequency under conditions with low signal-to-noise ratio. Additionally, the source separation methods based on local and peak frequency estimates, called LFD and PFD respectively, provide more stable estimates than the decomposition based on instantaneous frequency. In particular, LFD and PFD are able to recover the sources of interest in simulations performed with a realistic head model, providing higher correlations with an experimental variable than multiple linear regression. Finally, we also tested all decomposition methods on real EEG data from a steady-state visual evoked potential paradigm and show that the recovered sources are located in areas similar to those previously reported in other studies, thus providing further validation of the proposed methods.
View details
Analysing Cerebrospinal Fluid with Explainable Deep Learning: from Diagnostics to Insights
Leonille Schweizer
Philipp Seegerer
Hee‐yeong Kim
René Saitenmacher
Amos Muench
Liane Barnick
Anja Osterloh
Carsten Dittmayer
Ruben Jödicke
Debora Pehl
Annekathrin Reinhardt
Klemens Ruprecht
Werner Stenzel
Annika K Wefers
Patrick N Harter
Ulrich Schüller
Frank L Heppner
Maximilian Alber
Frederick Klauschen
Neuropathology and Applied Neurobiology, 49(1)(2023), e12866
Preview abstract
Aim
Analysis of cerebrospinal fluid (CSF) is essential for diagnostic workup of patients with neurological diseases and includes differential cell typing. The current gold standard is based on microscopic examination by specialised technicians and neuropathologists, which is time-consuming, labour-intensive and subjective.
Methods
We, therefore, developed an image analysis approach based on expert annotations of 123,181 digitised CSF objects from 78 patients corresponding to 15 clinically relevant categories and trained a multiclass convolutional neural network (CNN).
Results
The CNN classified the 15 categories with high accuracy (mean AUC 97.3%). By using explainable artificial intelligence (XAI), we demonstrate that the CNN identified meaningful cellular substructures in CSF cells recapitulating human pattern recognition. Based on the evaluation of 511 cells selected from 12 different CSF samples, we validated the CNN by comparing it with seven board-certified neuropathologists blinded for clinical information. Inter-rater agreement between the CNN and the ground truth was non-inferior (Krippendorff's alpha 0.79) compared with the agreement of seven human raters and the ground truth (mean Krippendorff's alpha 0.72, range 0.56–0.81). The CNN assigned the correct diagnostic label (inflammatory, haemorrhagic or neoplastic) in 10 out of 11 clinical samples, compared with 7–11 out of 11 by human raters.
Conclusions
Our approach provides the basis to overcome current limitations in automated cell classification for routine diagnostics and demonstrates how a visual explanation framework can connect machine decision-making with cell properties and thus provide a novel versatile and quantitative method for investigating CSF manifestations of various neurological diseases.
View details
Single-cell gene regulatory network prediction by explainable AI
Philipp Keyl
Philip Bischoff
Gabriel Dernbach
Michael Bockmayr
Rebecca Fritz
David Horst
Nils Blüthgen
Grégoire Montavon
Frederick Klauschen
Nucleic Acids Research(2023), gkac1212
Preview abstract
The molecular heterogeneity of cancer cells contributes to the often partial response to targeted therapies and relapse of disease due to the escape of resistant cell populations. While single-cell sequencing has started to improve our understanding of this heterogeneity, it offers a mostly descriptive view on cellular types and states. To obtain more functional insights, we propose scGeneRAI, an explainable deep learning approach that uses layer-wise relevance propagation (LRP) to infer gene regulatory networks from static single-cell RNA sequencing data for individual cells. We benchmark our method with synthetic data and apply it to single-cell RNA sequencing data of a cohort of human lung cancers. From the predicted single-cell networks our approach reveals characteristic network patterns for tumor cells and normal epithelial cells and identifies subnetworks that are observed only in (subgroups of) tumor cells of certain patients. While current state-of-the-art methods are limited by their ability to only predict average networks for cell populations, our approach facilitates the reconstruction of networks down to the level of single cells which can be utilized to characterize the heterogeneity of gene regulation within and across tumors.
View details
Patient-level proteomic network prediction by explainable artificial intelligence
Philipp Keyl
Michael Bockmayr
Daniel Heim
Gabriel Dernbach
Grégoire Montavon
Frederick Klauschen
npj Precision Oncology, 6(2022), pp. 35
Preview abstract
Understanding the pathological properties of dysregulated protein networks in individual patients’ tumors is the basis for precision therapy. Functional experiments are commonly used, but cover only parts of the oncogenic signaling networks, whereas methods that reconstruct networks from omics data usually only predict average network features across tumors. Here, we show that the explainable AI method layer-wise relevance propagation (LRP) can infer protein interaction networks for individual patients from proteomic profiling data. LRP reconstructs average and individual interaction networks with an AUC of 0.99 and 0.93, respectively, and outperforms state-of-the-art network prediction methods for individual tumors. Using data from The Cancer Proteome Atlas, we identify known and potentially novel oncogenic network features, among which some are cancer-type specific and show only minor variation among patients, while others are present across certain tumor types but differ among individual patients. Our approach may therefore support predictive diagnostics in precision oncology by inferring “patient-level” oncogenic mechanisms.
View details
New definitions of human lymphoid and follicular cell entities in lymphatic tissue by machine learning
Patrick Wagner
Nils Strodthoff
Patrick Wurzel
Arturo Marban
Sonja Scharf
Hendrik Schäfer,
Philipp Seegerer
Andreas Loth
Sylvia Hartmann
Frederick Klauschen
Wojciech Samek
Martin-Leo Hansmann
Scientific Reports, 12(2022), pp. 18991
Preview abstract
Histological sections of the lymphatic system are usually the basis of static (2D) morphological investigations. Here, we performed a dynamic (4D) analysis of human reactive lymphoid tissue using confocal fluorescent laser microscopy in combination with machine learning. Based on tracks for T-cells (CD3), B-cells (CD20), follicular T-helper cells (PD1) and optical flow of follicular dendritic cells (CD35), we put forward the first quantitative analysis of movement-related and morphological parameters within human lymphoid tissue. We identified correlations of follicular dendritic cell movement and the behavior of lymphocytes in the microenvironment. In addition, we investigated the value of movement and/or morphological parameters for a precise definition of cell types (CD clusters). CD-clusters could be determined based on movement and/or morphology. Differentiating between CD3- and CD20 positive cells is most challenging and long term-movement characteristics are indispensable. We propose morphological and movement-related prototypes of cell entities applying machine learning models. Finally, we define beyond CD clusters new subgroups within lymphocyte entities based on long term movement characteristics. In conclusion, we showed that the combination of 4D imaging and machine learning is able to define characteristics of lymphocytes not visible in 2D histology.
View details
2020 International brain–computer interface competition: A review
Ji-Hoon Jeong
Jeong-Hyun Cho
Young-Eun Lee
Seo-Hyun Lee
Gi-Hwan Shin
Young-Seok Kweon
José del R Millán
Seong-Whan Lee
Frontiers in Human Neuroscience, 16(2022), pp. 898300
Preview abstract
The brain-computer interface (BCI) has been investigated as a form of communication tool between the brain and external devices. BCIs have been extended beyond communication and control over the years. The 2020 international BCI competition aimed to provide high-quality neuroscientific data for open access that could be used to evaluate the current degree of technical advances in BCI. Although there are a variety of remaining challenges for future BCI advances, we discuss some of more recent application directions: (i) few-shot EEG learning, (ii) micro-sleep detection (iii) imagined speech decoding, (iv) cross-session classification, and (v) EEG(+ear-EEG) detection in an ambulatory environment. Not only did scientists from the BCI field compete, but scholars with a broad variety of backgrounds and nationalities participated in the competition to address these challenges. Each dataset was prepared and separated into three data that were released to the competitors in the form of training and validation sets followed by a test set. Remarkable BCI advances were identified through the 2020 competition and indicated some trends of interest to BCI researchers.
View details
Inverse design of 3d molecular structures with conditional generative neural networks
Niklas W. A. Gebauer
Michael Gastegger
Stefaan S. P. Hessmann
Kristof T. Schütt
Nature Communications, 13(2022), pp. 973
Preview abstract
The rational design of molecules with desired properties is a long-standing challenge in
chemistry. Generative neural networks have emerged as a powerful approach to sample
novel molecules from a learned distribution. Here, we propose a conditional generative neural
network for 3d molecular structures with specified chemical and structural properties. This
approach is agnostic to chemical bonding and enables targeted sampling of novel molecules
from conditional distributions, even in domains where reference calculations are sparse. We
demonstrate the utility of our method for inverse design by generating molecules with
specified motifs or composition, discovering particularly stable molecules, and jointly targeting multiple electronic properties beyond the training regime.
View details
Machine learning models predict the primary sites of head and neck squamous cell carcinoma metastases based on DNA methylation
Maximilian Leitheiser
David Capper
Philipp Seegerer
Annika Lehmann
Ulrich Schüller
Frederick Klauschen
Philipp Jurmeister
Michael Bockmayr
The Journal of Pathology, 254(4)(2022), pp. 378-387
Preview abstract
In head and neck squamous cell cancers (HNSCs) that present as metastases with an unknown primary (HNSC-CUPs), the identification of a primary tumor improves therapy options and increases patient survival. However, the currently available diagnostic methods are laborious and do not offer a sufficient detection rate. Predictive machine learning models based on DNA methylation profiles have recently emerged as a promising technique for tumor classification. We applied this technique to HNSC to develop a tool that can improve the diagnostic work-up for HNSC-CUPs. On a reference cohort of 405 primary HNSC samples, we developed four classifiers based on different machine learning models [random forest (RF), neural network (NN), elastic net penalized logistic regression (LOGREG), and support vector machine (SVM)] that predict the primary site of HNSC tumors from their DNA methylation profile. The classifiers achieved high classification accuracies (RF = 83%, NN = 88%, LOGREG = SVM = 89%) on an independent cohort of 64 HNSC metastases. Further, the NN, LOGREG, and SVM models significantly outperformed p16 status as a marker for an origin in the oropharynx. In conclusion, the DNA methylation profiles of HNSC metastases are characteristic for their primary sites, and the classifiers developed in this study, which are made available to the scientific community, can provide valuable information to guide the diagnostic work-up of HNSC-CUP.
View details
Building and Interpreting Deep Similarity Models
Oliver Eberle
Jochen Büttner
Florian Kräutli
Matteo Valleriani
Gregoire Montavon
IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3)(2022), pp. 1149-1161
Preview abstract
Many learning algorithms such as kernel machines, nearest neighbors, clustering, or anomaly detection, are based on
distances or similarities. Before similarities are used for training an actual machine learning model, we would like to verify that they are
bound to meaningful patterns in the data. In this paper, we propose to make similarities interpretable by augmenting them with an
explanation. We develop BiLRP, a scalable and theoretically founded method to systematically decompose the output of an already
trained deep similarity model on pairs of input features. Our method can be expressed as a composition of LRP explanations, which
were shown in previous works to scale to highly nonlinear models. Through an extensive set of experiments, we demonstrate that
BiLRP robustly explains complex similarity models, e.g. built on VGG-16 deep neural network features. Additionally, we apply our
method to an open problem in digital humanities: detailed assessment of similarity between historical documents such as astronomical
tables. Here again, BiLRP provides insight and brings verifiability into a highly engineered and problem-specific similarity model.
View details
Finding and removing Clever Hans: Using explanation methods to debug and improve deep models
Christopher J. Anders
Leander Weber
David Neumann
Wojciech Samek
Sebastian Lapuschkin
Information Fusion, 77(2022), pp. 261-295
Preview abstract
Contemporary learning models for computer vision are typically trained on very large (benchmark) datasets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploitable by the model. In the worst case, the trained model does not learn a valid and generalizable strategy to solve the problem it was trained for, and becomes a “Clever Hans” predictor that bases its decisions on spurious correlations in the training data, potentially yielding an unrepresentative or unfair, and possibly even hazardous predictor. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora. Based on a recent technique — Spectral Relevance Analysis — we propose the following technical contributions and resulting findings: (a) a scalable quantification of artifactual and poisoned classes where the machine learning models under study exhibit Clever Hans behavior, (b) several approaches we collectively denote as Class Artifact Compensation, which are able to effectively and significantly reduce a model’s Clever Hans behavior, i.e., we are able to un-Hans models trained on (poisoned) datasets, such as the popular ImageNet data corpus. We demonstrate that Class Artifact Compensation, defined in a simple theoretical framework, may be implemented as part of a neural network’s training or fine-tuning process, or in a post-hoc manner by injecting additional layers, preventing any further propagation of undesired Clever Hans features, into the network architecture. Using our proposed methods, we provide qualitative and quantitative analyses of the biases and artifacts in, e.g., the ImageNet dataset, the Adience benchmark dataset of unfiltered faces, and the ISIC 2019 skin lesion analysis dataset. We demonstrate that these insights can give rise to improved, more representative, and fairer models operating on implicitly cleaned data corpora.
View details
DNA methylation-based classification of sinonasal tumors
Philipp Jurmeister
Stefanie Glöß
Renée Roller
Maximilian Leitheiser
Simone Schmid
Liliana H Mochmann
Emma Payá Capilla
Rebecca Fritz
Carsten Dittmayer
Corinna Friedrich
Anne Thieme
Philipp Keyl
Armin Jarosch
Simon Schallenberg
Hendrik Bläker
Inga Hoffmann
Claudia Vollbrecht
Annika Lehmann
Michael Hummel
Daniel Heim
Mohamed Haji
Patrick Harter
Benjamin Englert
Stephan Frank
Jürgen Hench
Werner Paulus
Martin Hasselblatt
Wolfgang Hartmann
Hildegard Dohmen
Ursula Keber
Paul Jank
Carsten Denkert
Christine Stadelmann
Felix Bremmer
Annika Richter
Annika Wefers
Julika Ribbat-Idel
Sven Perner
Christian Idel
Lorenzo Chiariotti
Rosa Della Monica
Alfredo Marinelli
Ulrich Schüller
Michael Bockmayr
Jacklyn Liu
Valerie J Lund
Martin Forster
Matt Lechner
Sara L Lorenzo-Guerra
Mario Hermsen
Pascal D Johann
Abbas Agaimy
Philipp Seegerer
Arend Koch
Frank Heppner
Stefan M Pfister
David TW Jones
Martin Sill
Andreas von Deimling
Matija Snuderl
Erna Forgó
Brooke E. Howitt
Philipp Mertins
Frederick Klauschen
David Capper
Nature Communications, 13(2022), pp. 7148
Preview abstract
The diagnosis of sinonasal tumors is challenging due to a heterogeneous spectrum of various differential diagnoses as well as poorly defined, disputed entities such as sinonasal undifferentiated carcinomas (SNUCs). In this study, we apply a machine learning algorithm based on DNA methylation patterns to classify sinonasal tumors with clinical-grade reliability. We further show that sinonasal tumors with SNUC morphology are not as undifferentiated as their current terminology suggests but rather reassigned to four distinct molecular classes defined by epigenetic, mutational and proteomic profiles. This includes two classes with neuroendocrine differentiation, characterized by IDH2 or SMARCA4/ARID1A mutations with an overall favorable clinical course, one class composed of highly aggressive SMARCB1-deficient carcinomas and another class with tumors that represent potentially previously misclassified adenoid cystic carcinomas. Our findings can aid in improving the diagnostic classification of sinonasal tumors and could help to change the current perception of SNUCs.
View details
To pretrain or not? A systematic analysis of the benefits of pretraining in diabetic retinopathy
Vignesh Srinivasan
Nils Strodthoff
Jackie Ma
Alexander Binder
Wojciech Samek
Plos one, 17(10)(2022), e0274291
Preview abstract
There is an increasing number of medical use cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches in order to better understand what type of pretraining works reliably (with respect to performance, robustness, learned representation etc.) in practice and what type of pretraining dataset is best suited to achieve good performance in small target dataset size scenarios. Considering diabetic retinopathy grading as an exemplary use case, we compare the impact of different training procedures including recently established self-supervised pretraining methods based on contrastive learning. To this end, we investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models initialized from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions. In particular, self-supervised models show further benefits to supervised models. Self-supervised models with initialization from ImageNet pretraining not only report higher performance, they also reduce overfitting to large lesions along with improvements in taking into account minute lesions indicative of the progression of the disease. Understanding the effects of pretraining in a broader sense that goes beyond simple performance comparisons is of crucial importance for the broader medical imaging community beyond the use case considered in this work.
View details
Towards the interpretability of deep learning models for multi-modal neuroimaging: Finding structural changes of the ageing brain
Simon M Hofmann
Frauke Beyer
Sebastian Lapuschkin
Ole Goltermann
Markus Loeffler
Arno Villringer
Wojciech Samek
A Veronica Witte
Neuroimage, 261(2022), pp. 119504
Preview abstract
Brain-age (BA) estimates based on deep learning are increasingly used as neuroimaging biomarker for brain health; however, the underlying neural features have remained unclear. We combined ensembles of convolutional neural networks with Layer-wise Relevance Propagation (LRP) to detect which brain features contribute to BA. Trained on magnetic resonance imaging (MRI) data of a population-based study (n = 2637, 18–82 years), our models estimated age accurately based on single and multiple modalities, regionally restricted and whole-brain images (mean absolute errors 3.37–3.86 years). We find that BA estimates capture ageing at both small and large-scale changes, revealing gross enlargements of ventricles and subarachnoid spaces, as well as white matter lesions, and atrophies that appear throughout the brain. Divergence from expected ageing reflected cardiovascular risk factors and accelerated ageing was more pronounced in the frontal lobe. Applying LRP, our study demonstrates how superior deep learning models detect brain-ageing in healthy and at-risk individuals throughout adulthood.
View details
Forecasting industrial aging processes with machine learning methods
Mihail Bogojeski
Simeon Sauer
Franziska Horn
Computers & Chemical Engineering, 144(2021), pp. 107123
Preview abstract
Accurately predicting industrial aging processes makes it possible to schedule maintenance events further in advance, ensuring a cost-efficient and reliable operation of the plant. So far, these degradation processes were usually described by mechanistic or simple empirical prediction models. In this paper, we evaluate a wider range of data-driven models, comparing some traditional stateless models (linear and kernel ridge regression, feed-forward neural networks) to more complex recurrent neural networks (echo state networks and LSTMs). We first examine how much historical data is needed to train each of the models on a synthetic dataset with known dynamics. Next, the models are tested on real-world data from a large scale chemical plant. Our results show that recurrent models produce near perfect predictions when trained on larger datasets, and maintain a good performance even when trained on smaller datasets with domain shifts, while the simpler models only performed comparably on the smaller datasets.
View details
Unification of sparse Bayesian learning algorithms for electromagnetic brain imaging with the majorization minimization framework
Ali Hashemi
Chang Cai
Gitta Kutyniok
Srikantan S.Nagarajan
StefanHaufe
Neuroimage, https://doi.org/10.1016/j.neuroimage.2021.118309(2021)
Preview abstract
Methods for electro- or magnetoencephalography (EEG/MEG) based brain source imaging (BSI) using sparse Bayesian learning (SBL) have been demonstrated to achieve excellent performance in situations with low numbers of distinct active sources, such as event-related designs. This paper extends the theory and practice of SBL in three important ways. First, we reformulate three existing SBL algorithms under the majorization-minimization (MM) framework. This unification perspective not only provides a useful theoretical framework for comparing different algorithms in terms of their convergence behavior, but also provides a principled recipe for constructing novel algorithms with specific properties by designing appropriate bounds of the Bayesian marginal likelihood function. Second, building on the MM principle, we propose a novel method called LowSNR-BSI that achieves favorable source reconstruction performance in low signal-to-noise-ratio (SNR) settings. Third, precise knowledge of the noise level is a crucial requirement for accurate source reconstruction. Here we present a novel principled technique to accurately learn the noise variance from the data either jointly within the source reconstruction procedure or using one of two proposed cross-validation strategies. Empirically, we could show that the monotonous convergence behavior predicted from MM theory is confirmed in numerical experiments. Using simulations, we further demonstrate the advantage of LowSNR-BSI over conventional SBL in low-SNR regimes, and the advantage of learned noise levels over estimates derived from baseline data. To demonstrate the usefulness of our novel approach, we show neurophysiologically plausible source reconstructions on averaged auditory evoked potential data.
View details
Pruning by explaining: A novel criterion for deep neural network pruning
Seul-Ki Yeom
Philipp Seegerer
Sebastian Lapuschkin
Alexander Binder
Simon Wiedemann
Wojciech Samek
Pattern Recognition, 115(2021), pp. 107899
Preview abstract
The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the weights of various layers while at the same time aiming to not sacrifice performance. In this paper, we propose a novel criterion for CNN pruning inspired by neural network interpretability: The most relevant units, i.e. weights or filters, are automatically found using their relevance scores obtained from concepts of explainable AI (XAI). By exploring this idea, we connect the lines of interpretability and model compression research. We show that our proposed method can efficiently prune CNN models in transfer-learning setups in which networks pre-trained on large corpora are adapted to specialized tasks. The method is evaluated on a broad range of computer vision datasets. Notably, our novel criterion is not only competitive or better compared to state-of-the-art pruning criteria when successive retraining is performed, but clearly outperforms these previous criteria in the resource-constrained application scenario in which the data of the task to be transferred to is very scarce and one chooses to refrain from fine-tuning. Our method is able to compress the model iteratively while maintaining or even improving accuracy. At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning.
View details
Morphological and molecular breast cancer profiling through explainable machine learning
Alexander Binder
Michael Bockmayr
Miriam Hägele
Stephan Wienert
Daniel Heim
Katharina Hellweg
Masaru Ishii
Albrecht Stenzinger
Andreas Hocke
Carsten Denkert
Frederick Klauschen
Nature Machine Intelligence, 3(2021), 355–366
Preview abstract
Recent advances in cancer research and diagnostics largely rely on new developments in microscopic or molecular profiling techniques, offering high levels of detail with respect to either spatial or molecular features, but usually not both. Here, we present an explainable machine-learning approach for the integrated profiling of morphological, molecular and clinical features from breast cancer histology. First, our approach allows for the robust detection of cancer cells and tumour-infiltrating lymphocytes in histological images, providing precise heatmap visualizations explaining the classifier decisions. Second, molecular features, including DNA methylation, gene expression, copy number variations, somatic mutations and proteins are predicted from histology. Molecular predictions reach balanced accuracies up to 78%, whereas accuracies of over 95% can be achieved for subgroups of patients. Finally, our explainable AI approach allows assessment of the link between morphological and molecular cancer properties. The resulting computational multiplex-histology analysis can help promote basic cancer research and precision medicine through an integrated diagnostic scoring of histological, clinical and molecular features.
View details
Robustifying models against adversarial attacks by Langevin dynamics
Vignesh Srinivasan
Csaba Rohrer
Arturo Marban
Wojciech Samek
Shinichi Nakajima
Neural Networks, 137(2021), pp. 1-17
Preview abstract
Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a number of defense methods were proposed, which however, have been circumvented by newer and more sophisticated attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains a challenging task. This paper proposes a novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. To achieve this task, we introduce a generative model of the conditional distribution of the inputs given labels that can be learned through a supervised Denoising Autoencoder (sDAE) in alignment with a discriminative classifier. Our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion—projection is distributed broadly. This prevents white box attacks from accurately aligning the input to create an adversarial sample effectively. MALADE is applicable to any existing classifier, providing robust defense as well as off-manifold sample detection. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.
View details
Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
Felix Sattler
Wojciech Samek
IEEE Transactions on Neural Networks and Learning Systems, 32(8)(2021), pp. 3710-3722
Preview abstract
Federated learning (FL) is currently the most widely
adopted framework for collaborative training of (deep) machine
learning models under privacy constraints. Albeit its popularity,
it has been observed that FL yields suboptimal results if the
local clients’ data distributions diverge. To address this issue,
we present clustered FL (CFL), a novel federated multitask
learning (FMTL) framework, which exploits geometric properties
of the FL loss surface to group the client population into clusters
with jointly trainable data distributions. In contrast to existing
FMTL approaches, CFL does not require any modifications to the
FL communication protocol to be made, is applicable to general
nonconvex objectives (in particular, deep neural networks), does
not require the number of clusters to be known a priori, and
comes with strong mathematical guarantees on the clustering
quality. CFL is flexible enough to handle client populations that
vary over time and can be implemented in a privacy-preserving
way. As clustering is only performed after FL has converged to a
stationary point, CFL can be viewed as a postprocessing method
that will always achieve greater or equal performance than
conventional FL by allowing clients to arrive at more specialized
models. We verify our theoretical analysis in experiments with
deep convolutional and recurrent neural networks on commonly
used FL data sets.
View details
Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications
Wojciech Samek
Gregoire Montavon
Sebastian Lapuschkin
Christopher J. Anders
Proc of the IEEE, 109(3)(2021), pp. 247-278
Preview abstract
With the broader and highly successful usage of machine learning (ML) in industry and the sciences, there has been a growing demand for explainable artificial intelligence (XAI). Interpretability and explanation methods for gaining a better understanding of the problem-solving abilities and strategies of nonlinear ML, in particular, deep neural networks, are, therefore, receiving increased attention. In this work, we aim to: 1) provide a timely overview of this active emerging field, with a focus on “ post hoc ” explanations, and explain its theoretical foundations; 2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations; 3) outline best practice aspects, i.e., how to best include interpretation methods into the standard usage of ML; and 4) demonstrate successful usage of XAI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of ML.
View details
Unification of sparse Bayesian learning algorithms for electromagnetic brain imaging with the majorization minimization framework
Ali Hashemi
Chang Cai
Gitta Kutyniok
Srikantan S Nagarajan
Stefan Haufe
Neuroimage, 239(2021), pp. 118309
Preview abstract
Methods for electro- or magnetoencephalography (EEG/MEG) based brain source imaging (BSI) using sparse Bayesian learning (SBL) have been demonstrated to achieve excellent performance in situations with low numbers of distinct active sources, such as event-related designs. This paper extends the theory and practice of SBL in three important ways. First, we reformulate three existing SBL algorithms under the majorization-minimization (MM) framework. This unification perspective not only provides a useful theoretical framework for comparing different algorithms in terms of their convergence behavior, but also provides a principled recipe for constructing novel algorithms with specific properties by designing appropriate bounds of the Bayesian marginal likelihood function. Second, building on the MM principle, we propose a novel method called LowSNR-BSI that achieves favorable source reconstruction performance in low signal-to-noise-ratio (SNR) settings. Third, precise knowledge of the noise level is a crucial requirement for accurate source reconstruction. Here we present a novel principled technique to accurately learn the noise variance from the data either jointly within the source reconstruction procedure or using one of two proposed cross-validation strategies. Empirically, we could show that the monotonous convergence behavior predicted from MM theory is confirmed in numerical experiments. Using simulations, we further demonstrate the advantage of LowSNR-BSI over conventional SBL in low-SNR regimes, and the advantage of learned noise levels over estimates derived from baseline data. To demonstrate the usefulness of our novel approach, we show neurophysiologically plausible source reconstructions on averaged auditory evoked potential data.
View details
Immediate brain plasticity after one hour of brain–computer interface (BCI)
Till Nierhaus
Carmen Vidaurre
Claudia Sannelli
Arno Villringer
The Journal of Physiology, 599(9)(2021), pp. 2435-2451
Preview abstract
A brain-computer-interface (BCI) allows humans to control computational devices
using only neural signals. However, it is still an open question, whether performing BCI also
impacts on the brain itself, i.e. whether brain plasticity is induced. Here, we show rapid and
spatially specific signs of brain plasticity measured with functional and structural MRI after only
1 h of purely mental BCI training in BCI-naive subjects. We employed two BCI approaches
with neurofeedback based on (i) modulations of EEG rhythms by motor imagery (MI-BCI) or
(ii) event-related potentials elicited by visually targeting flashing letters (ERP-BCI). Before and
after the BCI session we performed structural and functional MRI. For both BCI approaches
we found increased T1-weighted MR signal in the grey matter of the respective target brain
regions, such as occipital/parietal areas after ERP-BCI and precuneus and sensorimotor regions.
View details
Machine learning of solvent effects on molecular spectra and reactions
Michael Gastegger
Kristof T. Schütt
Chemical Science(2021), http://dx.doi.org/10.1039/D1SC02742E
Preview abstract
Fast and accurate simulation of complex chemical systems in environments such as solutions is a long standing challenge in theoretical chemistry. In recent years{,} machine learning has extended the boundaries of quantum chemistry by providing highly accurate and efficient surrogate models of electronic structure theory{,} which previously have been out of reach for conventional approaches. Those models have long been restricted to closed molecular systems without accounting for environmental influences, such as external electric and magnetic fields or solvent effects. Here, we introduce the deep neural network FieldSchNet for modeling the interaction of molecules with arbitrary external fields. FieldSchNet offers access to a wealth of molecular response properties, enabling it to simulate a wide range of molecular spectra, such as infrared, Raman and nuclear magnetic resonance. Beyond that, it is able to describe implicit and explicit molecular environments, operating as a polarizable continuum model for solvation or in a quantum mechanics/molecular mechanics setup. We employ FieldSchNet to study the influence of solvent effects on molecular spectra and a Claisen rearrangement reaction. Based on these results, we use FieldSchNet to design an external environment capable of lowering the activation barrier of the rearrangement reaction significantly, demonstrating promising venues for inverse chemical design.
View details
Leaf-inspired homeostatic cellulose biosensors
Ji-Yong Kim
Yong Ju Yun
Joshua Jeong
C.-Yoon Kim
Seong-Whan Lee
Science Advances, 7(16)(2021), eabe7432
Preview abstract
An incompatibility between skin homeostasis and existing biosensor interfaces inhibits long-term electrophysiological signal measurement. Inspired by the leaf homeostasis system, we developed the first homeostatic cellulose biosensor with functions of protection, sensation, self-regulation, and biosafety. Moreover, we find that a mesoporous cellulose membrane transforms into homeostatic material with properties that include high ion conductivity, excellent flexibility and stability, appropriate adhesion force, and self-healing effects when swollen in a saline solution. The proposed biosensor is found to maintain a stable skin-sensor interface through homeostasis even when challenged by various stresses, such as a dynamic environment, severe detachment, dense hair, sweat, and long-term measurement. Last, we demonstrate the high usability of our homeostatic biosensor for continuous and stable measurement of electrophysiological signals and give a showcase application in the field of brain-computer interfacing where the biosensors and machine learning together help to control real-time applications beyond the laboratory at unprecedented versatility.
View details
Autonomous robotic nanofabrication with reinforcement learning
Philipp Leinen
Malte Esders
Kristof T. Schütt
Christian Wagner
F. Stefan Tautz
Science Advances, 6 (36)(2020), eabb6987
Preview abstract
The ability to handle single molecules as effectively as macroscopic building blocks would enable the construction of complex supramolecular structures inaccessible to self-assembly. The fundamental challenges obstructing this goal are the uncontrolled variability and poor observability of atomic-scale conformations. Here, we present a strategy to work around both obstacles and demonstrate autonomous robotic nanofabrication by manipulating single molecules. Our approach uses reinforcement learning (RL), which finds solution strategies even in the face of large uncertainty and sparse feedback. We demonstrate the potential of our RL approach by removing molecules autonomously with a scanning probe microscope from a supramolecular structure. Our RL agent reaches an excellent performance, enabling us to automate a task that previously had to be performed by a human. We anticipate that our work opens the way toward autonomous agents for the robotic construction of functional supramolecular structures with speed, precision, and perseverance beyond our current capabilities.
View details
Ensemble learning of coarse-grained molecular dynamics force fields with a kernel approach
Jiang Wang
Stefan Chmiela
Frank Noé
Cecilia Clementi
The Journal of Chemical Physics, 152(2020), pp. 194106
Preview abstract
Gradient-domain machine learning (GDML) is an accurate and efficient approach to learn a molecular potential and associated force field based on the kernel ridge regression algorithm. Here, we demonstrate its application to learn an effective coarse-grained (CG) model from all-atom simulation data in a sample efficient manner. The CG force field is learned by following the thermodynamic consistency principle, here by minimizing the error between the predicted CG force and the all-atom mean force in the CG coordinates. Solving this problem by GDML directly is impossible because coarse-graining requires averaging over many training data points, resulting in impractical memory requirements for storing the kernel matrices. In this work, we propose a data-efficient and memory-saving alternative. Using ensemble learning and stratified sampling, we propose a 2-layer training scheme that enables GDML to learn an effective CG model. We illustrate our method on a simple biomolecular system, alanine dipeptide, by reconstructing the free energy landscape of a CG variant of this molecule. Our novel GDML training scheme yields a smaller free energy error than neural networks when the training set is small, and a comparably high accuracy when the training set is sufficiently large.
View details
Fairwashing Explanations with Off-Manifold Detergent
Christopher J. Anders
Plamen Pasliev
Ann-Kathrin Dombrowski
Pan Kessel
International Conference on Machine Learning, PMLR(2020), pp. 314-323
Preview abstract
Explanation methods promise to make black-box
classifiers more transparent. As a result, it is
hoped that they can act as proof for a sensible,
fair and trustworthy decision-making process of
the algorithm and thereby increase its acceptance
by the end-users. In this paper, we show both theoretically and experimentally that these hopes are
presently unfounded. Specifically, we show that,
for any classifier g, one can always construct another classifier g˜ which has the same behavior on
the data (same train, validation, and test error) but
has arbitrarily manipulated explanation maps. We
derive this statement theoretically using differential geometry and demonstrate it experimentally
for various explanation methods, architectures,
and datasets. Motivated by our theoretical insights, we then propose a modification of existing
explanation methods which makes them significantly more robust.
View details
Enhanced Performance of a Brain Switch by Simultaneous Use of EEG and NIRS Data for Asynchronous Brain-Computer Interface
Chang-Hee Han
Han-Jeong Hwang
IEEE Transactions on Neural Systems and Rehabilitation Engineering, 28(10)(2020), pp. 2102-2112
Preview abstract
Previous studies have shown the superior performance of hybrid electroencephalography (EEG)/ near-infrared spectroscopy (NIRS) brain-computer interfaces (BCIs). However, it has been veiled whether the use of a hybrid EEG/NIRS modality can provide better performance for a brain switch that can detect the onset of the intention to turn on a BCI. In this study, we developed such a hybrid EEG/NIRS brain switch and compared its performance with single modality EEG- and NIRS-based brain switch respectively, in terms of true positive rate (TPR), false positive rate (FPR), onset detection time (ODT), and information transfer rate (ITR). In an offline analysis, the performance of a hybrid EEG/NIRS brain switch was significantly improved over that of EEG- and NIRS-based brain switches in general, and in particular a significantly lower FPR was observed for the hybrid EEG/NIRS brain switch. A pseudo-online analysis was additionally performed to confirm the feasibility of implementing an online BCI system with our hybrid EEG/NIRS brain switch. The overall trend of pseudo-online analysis results generally coincided with that of the offline analysis results. No significant difference in all performance measures was also found between offline and pseudo online analysis schemes when the amount of training data was same, with one exception for the ITRs of an EEG brain switch. These offline and pseudo-online results demonstrate that a hybrid EEG/NIRS brain switch can be used to provide a better onset detection performance than that of a single neuroimaging modality.
View details
An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions
Preview abstract
The game of curling can be considered a good test bed for studying the interaction between artificial intelligence
systems and the real world. In curling, the environmental characteristics change at every moment, and every throw
has an impact on the outcome of the match. Furthermore, there is no time for relearning during a curling match
due to the timing rules of the game. Here, we report a curling robot that can achieve human-level performance in
the game of curling using an adaptive deep reinforcement learning framework. Our proposed adaptation framework
extends standard deep reinforcement learning using temporal features, which learn to compensate for the uncertainties and nonstationarities that are an unavoidable part of curling. Our curling robot, Curly, was able to win
three of four official matches against expert human teams [top-ranked women’s curling teams and Korea national
wheelchair curling team (reserve team)]. These results indicate that the gap between physics-based simulators and
the real world can be narrowed.
View details
Exploring chemical compound space with quantum-based machine learning
O. Anatole von Lilienfeld
Alexandre Tkatchenko
Nature Reviews Chemistry, 4(2020), 347–358
Preview abstract
Rational design of compounds with specific properties requires understanding and fast evaluation of molecular properties throughout chemical compound space — the huge set of all potentially stable molecules. Recent advances in combining quantum-mechanical calculations with machine learning provide powerful tools for exploring wide swathes of chemical compound space. We present our perspective on this exciting and quickly developing field by discussing key advances in the development and applications of quantum-mechanics-based machine-learning methods to diverse compounds and properties, and outlining the challenges ahead. We argue that significant progress in the exploration and understanding of chemical compound space can be made through a systematic combination of rigorous physical theories, comprehensive synthetic data sets of microscopic and macroscopic properties, and modern machine-learning methods that account for physical and chemical knowledge.
View details
Quantum chemical accuracy from density functional approximations via machine learning
Mihail Bogojeski
Leslie Vogt-Maranto
Mark E. Tuckerman
Kieron Burke
Nature Communications, 11(2020), pp. 5223
Preview abstract
Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry,
but accuracies for many molecules are limited to 2-3 kcal ⋅ mol−1 with presently-available
functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher
accuracy, but computational costs limit their application to small molecules. In this paper, we
leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching
quantum chemical accuracy (errors below 1 kcal ⋅ mol−1
) on test data. Moreover, densitybased Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT )
significantly reduces the amount of training data required, particularly when molecular
symmetries are included. The robustness of Δ-DFT is highlighted by correcting “on the fly”
DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD
trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT facilitates
running gas-phase MD simulations with quantum chemical accuracy, even for strained
geometries and conformer changes where standard DFT fails.
View details
Machine Learning Meets Quantum Physics
Kristof T. Schütt
Stefan Chmiela
O. Anatole von Lilienfeld
Alexandre Tkatchenko
Koji Tsuda
Springer(2020)
Preview abstract
A Book that connects the fields of Machine Learning and Quantum Chemistry .
View details