Klaus-Robert Müller

Klaus-Robert Müller

Klaus-Robert Müller has been a professor of computer science at Technische Universität Berlin since 2006; at the same time he is directing rsp. co-directing the Berlin Machine Learning Center and the Berlin Big Data Center and most recently BIFOLD . He studied physics in Karlsruhe from1984 to 1989 and obtained his Ph.D. degree in computer science at Technische Universität Karlsruhe in 1992. After completing a postdoctoral position at GMD FIRST in Berlin, he was a research fellow at the University of Tokyo from 1994 to 1995. In 1995, he founded the Intelligent Data Analysis group at GMD-FIRST (later Fraunhofer FIRST) and directed it until 2008. From 1999 to 2006, he was a professor at the University of Potsdam. From 2012 he has been Distinguished Professor at Korea University in Seoul. In 2020/2021 he spent his sabbatical at Google Brain as a Principal Scientist. Among others, he was awarded the Olympus Prize for Pattern Recognition (1999), the SEL Alcatel Communication Award (2006), the Science Prize of Berlin by the Governing Mayor of Berlin (2014), the Vodafone Innovations Award (2017), Hector Science Award (2024), Pattern Recognition Best Paper award (2020), Digital Signal Processing Best Paper award (2022). In 2012, he was elected member of the German National Academy of Sciences-Leopoldina, in 2017 of the Berlin Brandenburg Academy of Sciences, in 2021 of the German National Academy of Science and Engineering and also in 2017 external scientific member of the Max Planck Society. From 2019 on he became an ISI Highly Cited researcher in the cross-disciplinary area. His research interests are intelligent data analysis and Machine Learning in the sciences (Neuroscience (specifically Brain-Computer Interfaces, Physics, Chemistry) and in industry.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Accurate global machine learning force fields for molecules with hundreds of atoms
    Stefan Chmiela
    Valentin Vassilev Galindo
    Adil Kabylda
    Huziel E. Sauceda
    Alexandre Tkatchenko
    Science Advances, 9(2)(2023), eadf0873
    Preview abstract Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset. View details
    Canonical Response Parameterization: Quantifying the structure of responses to single-pulse intracranial electrical brain stimulation
    Kai J. Miller
    Gabriela Ojeda Valencia
    Harvey Huang
    Nicholas M. Gregg
    Gregory A. Worrell
    Dora Hermes
    Plos Computational Biology, 19(5)(2023), e1011105
    Preview abstract Single-pulse electrical stimulation in the nervous system, often called cortico-cortical evoked potential (CCEP) measurement, is an important technique to understand how brain regions interact with one another. Voltages are measured from implanted electrodes in one brain area while stimulating another with brief current impulses separated by several seconds. Historically, researchers have tried to understand the significance of evoked voltage polyphasic deflections by visual inspection, but no general-purpose tool has emerged to understand their shapes or describe them mathematically. We describe and illustrate a new technique to parameterize brain stimulation data, where voltage response traces are projected into one another using a semi-normalized dot product. The length of timepoints from stimulation included in the dot product is varied to obtain a temporal profile of structural significance, and the peak of the profile uniquely identifies the duration of the response. Using linear kernel PCA, a canonical response shape is obtained over this duration, and then single-trial traces are parameterized as a projection of this canonical shape with a residual term. Such parameterization allows for dissimilar trace shapes from different brain areas to be directly compared by quantifying cross-projection magnitudes, response duration, canonical shape projection amplitudes, signal-to-noise ratios, explained variance, and statistical significance. Artifactual trials are automatically identified by outliers in sub-distributions of cross-projection magnitude, and rejected. This technique, which we call “Canonical Response Parameterization” (CRP) dramatically simplifies the study of CCEP shapes, and may also be applied in a wide range of other settings involving event-triggered data. View details
    BIGDML—Towards accurate quantum machine learning force fields for materials
    Huziel Sauceda
    Luis Gálvez-González
    Stefan Chmiela
    Lauro Oliver Paz Borbon
    Alexandre Tkatchenko
    Nature Communications, 13(2022), pp. 3733
    Preview abstract Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof. Currently, MLFFs often introduce tradeoffs that restrict their practical applicability to small subsets of chemical space or require exhaustive datasets for training. Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning (BIGDML) approach and demonstrate its ability to construct reliable force fields using a training set with just 10–200 geometries for materials including pristine and defect-containing 2D and 3D semiconductors and metals, as well as chemisorbed and physisorbed atomic and molecular adsorbates on surfaces. The BIGDML model employs the full relevant symmetry group for a given material, does not assume artificial atom types or localization of atomic interactions and exhibits high data efficiency and state-of-the-art energy accuracies (errors substantially below 1 meV per atom) for an extended set of materials. Extensive path-integral molecular dynamics carried out with BIGDML models demonstrate the counterintuitive localization of benzene–graphene dynamics induced by nuclear quantum effects and their strong contributions to the hydrogen diffusion coefficient in a Pd crystal for a wide range of temperatures. View details
    Towards Robust Explanations for Deep Neural Networks
    Ann-Kathrin Dombrowski
    Christopher Johannes Anders
    Pan Kessel
    Pattern Recognition, 121(2022), pp. 108194
    Preview abstract Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, smoothing activation functions, and minimizing the Hessian of the network. Our experimental results confirm the effectiveness of these approaches. View details
    Preview abstract The application of machine learning (ML) methods in quantum chemistry has enabled the study of numerous chemical phenomena, which are computationally intractable with traditional ab initio methods. However, some quantum mechanical properties of molecules and materials depend on non-local electronic effects, which are often neglected due to the difficulty of modelling them efficiently. This work proposes a modified attention mechanism adapted to the underlying physics, which allows to recover the relevant non-local effects. Namely, we introduce spherical harmonic coordinates (SPHCs) to reflect higher order geometric information for each atom in a molecule, enabling a non-local formulation of attention in the SPHC space. Our proposed model So3krates -- a self-attention based message passing neural network (MPNN) -- uncouples geometric information from atomic features, making them independently amenable to attention mechanisms. We show that in contrast to other published methods, So3krates is able to describe quantum mechanical effects due to orbital overlap over arbitrary length scales. Further, So3krates is shown to match or exceed state-of-the-art performance on the popular MD-17 and QM-7X benchmarks, notably, requiring a significantly lower number of parameters while at the same time giving a substantial speedup compared to other models. View details
    Algorithmic Differentiation for Automatized Modelling of Machine Learned Force Fields
    Niklas Schmitz
    Stefan Chmiela
    The Journal of Physical Chemistry Letters, 13(43)(2022), pp. 10183-10189
    Preview abstract Reconstructing force fields (FFs) from atomistic simulation data is a challenge since accurate data can be highly expensive. Here, machine learning (ML) models can help to be data economic as they can be successfully constrained using the underlying symmetry and conservation laws of physics. However, so far, every descriptor newly proposed for an ML model has required a cumbersome and mathematically tedious remodeling. We therefore propose using modern techniques from algorithmic differentiation within the ML modeling process, effectively enabling the usage of novel descriptors or models fully automatically at an order of magnitude higher computational efficiency. This paradigmatic approach enables not only a versatile usage of novel representations and the efficient computation of larger systems─all of high value to the FF community─but also the simple inclusion of further physical knowledge, such as higher-order information (e.g., Hessians, more complex partial differential equations constraints etc.), even beyond the presented FF domain. View details
    Toward Explainable Artificial Intelligence for Regression Models: A methodological perspective
    Simon Letzgus
    Jonas Lederer
    Wojciech Samek
    Gregoire Montavon
    IEEE Signal Processing Magazine, 39 (4)(2022), 40–58
    Preview abstract In addition to the impressive predictive power of machine learning (ML) models, more recently, explanation methods have emerged that enable an interpretation of complex nonlinear learning models, such as deep neural networks. Gaining a better understanding is especially important, e.g., for safety-critical ML applications or medical diagnostics and so on. Although such explainable artificial intelligence (XAI) techniques have reached significant popularity for classifiers, thus far, little attention has been devoted to XAI for regression models (XAIR). In this review, we clarify the fundamental conceptual differences of XAI for regression and classification tasks, establish novel theoretical insights and analysis for XAIR, provide demonstrations of XAIR on genuine practical regression problems, and finally, discuss challenges remaining for the field. View details
    Harmoni: a Method for Eliminating Spurious Interactions due to the Harmonic Components in Neuronal Data
    Mina Jamshidi Idaji
    Juanli Zhang
    Tilman Stephani
    Guido Nolte
    Arno Villringer
    Vadim Nikulin
    Neuroimage, 252(2022), pp. 119053
    Preview abstract Cross-frequency synchronization (CFS) has been proposed as a mechanism for integrating spatially and spectrally distributed information in the brain. However, investigating CFS in Magneto- and Electroencephalography (MEG/EEG) is hampered by the presence of spurious neuronal interactions due to the non-sinusoidal waveshape of brain oscillations. Such waveshape gives rise to the presence of oscillatory harmonics mimicking genuine neuronal oscillations. Until recently, however, there has been no methodology for removing these harmonics from neuronal data. In order to address this long-standing challenge, we introduce a novel method (called HARMOnic miNImization - Harmoni) that removes the signal components which can be harmonics of a non-sinusoidal signal. Harmoni’s working principle is based on the presence of CFS between harmonic components and the fundamental component of a non-sinusoidal signal. We extensively tested Harmoni in realistic EEG simulations. The simulated couplings between the source signals represented genuine and spurious CFS and within-frequency phase synchronization. Using diverse evaluation criteria, including ROC analyses, we showed that the within- and cross-frequency spurious interactions are suppressed significantly, while the genuine activities are not affected. Additionally, we applied Harmoni to real resting-state EEG data revealing intricate remote connectivity patterns which are usually masked by the spurious connections. Given the ubiquity of non-sinusoidal neuronal oscillations in electrophysiological recordings, Harmoni is expected to facilitate novel insights into genuine neuronal interactions in various research fields, and can also serve as a steppingstone towards the development of further signal processing methods aiming at refining within- and cross-frequency synchronization in electrophysiological recordings. View details
    Artificial Intelligence and Pathology: from Principles to Practice and Future Applications in Histomorphology and Molecular Profiling
    Albrecht Stenzinger
    Max Alber
    Michael Allgäuer
    Phillip Jurmeister
    Michael Bockmayr
    Jan Budczies
    Jochen Lennerz
    Johannes Eschrich
    Daniel Kazdal
    Peter Schirmacher
    Alex H Wagner
    Frank Tacke
    David Capper
    Frederick Klauschen
    Seminars in Cancer Biology, 84(2022), pp. 129-143
    Preview abstract The complexity of diagnostic (surgical) pathology has increased substantially over the last decades with respect to histomorphological and molecular profiling and has steadily expanded its role in tumor diagnostics and beyond from disease entity identification via prognosis estimation to precision therapy prediction. It is therefore not surprising that pathology is among the disciplines in medicine with high expectations in the application of artificial intelligence (AI) or machine learning approaches given its capabilities to analyse complex data in a quantitative and standardised manner to further enhance scope and precision of diagnostics. While an obvious application is the analysis of histological images, recent applications for the analysis of molecular profiling data from different sources and clinical data support the notion that AI will support both histopathology and molecular pathology in the future. At the same time, current literature should not be misunderstood in a way that pathologists will likely be replaced by AI applications in the foreseeable future. Although AI will likely transform pathology in the coming years, recent studies reporting AI algorithms to diagnose cancer or predict certain molecular properties deal with relatively simple diagnostic problems that fall short of the diagnostic complexity pathologists face in clinical routine. Here, we review the pertinent literature of AI methods and their applications to pathology, and put the current achievements and what can be expected in the future in the context of the requirements for research and routine diagnostics. View details
    Efficient Computation of Higher-Order Subgraph Attribution via Message Passing
    Ping Xiong
    Thomas Schnake
    Gregoire Montavon
    Shin Nakajima
    ICML(2022) (to appear)
    Preview abstract Explaining graph neural networks (GNNs) has become more and more important recently. Higherorder interpretation schemes, such as GNNLRP (layer-wise relevance propagation for GNN), emerged as powerful tools for unraveling how different features interact thereby contributing to explaining GNNs. Methods such as GNN-LRP perform walks between nodes at each layer, and there are exponentially many such walks. In this work, we demonstrate that such exponential complexity can be avoided, in particular, we propose novel linear-time (w.r.t. depth) algorithms that enable to efficiently perform GNN-LRP for subgraphs. Our algorithms are derived via message passing techniques that make use of the distributive property, thereby directly computing quantities for higher-order explanations. We further adapt our efficient algorithms to compute a generalization of subgraph attributions that also takes into account the neighboring graph features. Experimental results show significant acceleration of the proposed algorithms and demonstrate a high usefulness and scalability of our novel generalized subgraph attribution. View details
    Higher-Order Explanations of Graph Neural Networks via Relevant Walks
    Thomas Schnake
    Oliver Eberle
    Jonas Lederer
    Shin Nakajima
    Kristof T. Schütt
    Gregoire Montavon
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11)(2022), pp. 7581 - 7596
    Preview abstract Graph Neural Networks (GNNs) are a popular approach for predicting graph structured data. As GNNs tightly entangle the input graph into the neural network structure, common explainable AI approaches are not applicable. To a large extent, GNNs have remained black-boxes for the user so far. In this paper, we show that GNNs can in fact be naturally explained using higher-order expansions, i.e., by identifying groups of edges that jointly contribute to the prediction. Practically, we find that such explanations can be extracted using a nested attribution scheme, where existing techniques such as layer-wise relevance propagation (LRP) can be applied at each step. The output is a collection of walks into the input graph that are relevant for the prediction. Our novel explanation method, which we denote by GNN-LRP, is applicable to a broad range of graph neural networks and lets us extract practically relevant insights on sentiment analysis of text data, structure-property relationships in quantum chemistry, and image classification. View details
    Super-resolution in Molecular Dynamics Trajectory Reconstruction with Bi-Directional Neural Networks
    Paul Ludwig Winkler
    Huziel Saucceda
    Machine Learning: Science and Technology, 3(2022), pp. 025011
    Preview abstract Molecular dynamics (MD) simulations are a cornerstone in science, enabling the investigation of a system’s thermodynamics all the way to analyzing intricate molecular interactions. In general, creating extended molecular trajectories can be a computationally expensive process, for example, when running ab-initio simulations. Hence, repeating such calculations to either obtain more accurate thermodynamics or to get a higher resolution in the dynamics generated by a fine-grained quantum interaction can be time- and computational resource-consuming. In this work, we explore different machine learning methodologies to increase the resolution of MD trajectories on-demand within a post-processing step. As a proof of concept, we analyse the performance of bi-directional neural networks (NNs) such as neural ODEs, Hamiltonian networks, recurrent NNs and long short-term memories, as well as the uni-directional variants as a reference, for MD simulations (here: the MD17 dataset). We have found that Bi-LSTMs are the best performing models; by utilizing the local time-symmetry of thermostated trajectories they can even learn long-range correlations and display high robustness to noisy dynamics across molecular complexity. Our models can reach accuracies of up to 10−4 Å in trajectory interpolation, which leads to the faithful reconstruction of several unseen high-frequency molecular vibration cycles. This renders the comparison between the learned and reference trajectories indistinguishable. The results reported in this work can serve (1) as a baseline for larger systems, as well as (2) for the construction of better MD integrators. View details
    Basis profile curve identification to understand electrical stimulation effects in human brain networks
    Kai Joshua Miller
    Dora Hermes
    Plos Computational Biology, 17(9)(2021), e1008710, https://doi.org/10.1371/journal.pcbi.1008710
    Preview abstract Brain networks can be explored by delivering brief pulses of electrical current in one area while measuring voltage responses in other areas. We propose a convergent paradigm to study brain dynamics, focusing on a single brain site to observe the average effect of stimulating each of many other brain sites. Viewed in this manner, visually-apparent motifs in the temporal response shape emerge from adjacent stimulation sites. This work constructs and illustrates a data-driven approach to determine characteristic spatiotemporal structure in these response shapes, summarized by a set of unique “basis profile curves” (BPCs). Each BPC may be mapped back to underlying anatomy in a natural way, quantifying projection strength from each stimulation site using simple metrics. Our technique is demonstrated for an array of implanted brain surface electrodes in a human patient. This framework enables straightforward interpretation of single-pulse brain stimulation data, and can be applied generically to explore the diverse milieu of interactions that comprise the connectome. View details
    Explainable Deep One-Class Classification
    Philipp Liznerski
    Lukas Ruff
    Robert Vandermeulen
    Billy Joe Franks
    Marius Kloft
    ICLR 2021(2021) (to appear)
    Preview abstract Deep one-class classification variants for anomaly detection learn a mapping that concentrates nominal samples in feature space causing anomalies to be mapped away. Because this transformation is highly non-linear, finding interpretations poses a significant challenge. In this paper we present an explainable deep one-class classification method, Fully Convolutional Data Description (FCDD), where the mapped samples are themselves also an explanation heatmap. FCDD yields competitive detection performance and provides reasonable explanations on common anomaly detection benchmarks with CIFAR-10 and ImageNet. On MVTec-AD, a recent manufacturing dataset offering ground-truth anomaly maps, FCDD sets a new state of the art in the unsupervised setting. Our method can incorporate groundtruth anomaly maps during training and using even a few of these (∼ 5) improves performance significantly. Finally, using FCDD’s explanations we demonstrate the vulnerability of deep one-class classification models to spurious image features such as image watermarks View details
    Dynamical Strengthening of Covalent and Non-Covalent Molecular Interactions by Nuclear Quantum Effects at Finite Temperature
    Huziel Saucceda
    Stefan Chmiela
    Valentin Vassilev Galindo
    Alexandre Tkatchenko
    Nature Communications, 12(2021), pp. 442
    Preview abstract Nuclear quantum effects (NQE) tend to generate delocalized molecular dynamics due to the anharmonicity of interatomic interactions. Here, we present evidence that NQE often enhance electronic interactions and, in turn, can result in dynamical molecular stabilization at finite temperature. The underlying physical mechanism promoted by NQE depends on the particular interaction under consideration. First, the effective reduction of interatomic distances between functional groups within a molecule enhances the n → π ∗ interaction by increasing the overlap between molecular orbitals or by strengthening electrostatic interactions between neighboring charge densities. Second, NQE can localize methyl rotors by temporarily changing molecular bond orders and leading to the emergence of localized transient rotor states. Third, for noncovalent interactions the strengthening comes from the increase of the polarizability given the expanded average interatomic distances induced by NQE. The implications of these boosted interactions include counterintuitive hydroxyl–hydroxyl bonding, hindered methyl rotor dynamics, and molecular stiffening which generates smoother free-energy surfaces. These results challenge the general assumption that NQE tend to mainly generate delocalized dynamics and reveal that NQE also play an active role in dynamical strengthening of molecular interactions. Our findings yield new insights into the versatile role of nuclear quantum fluctuations in molecules and materials View details
    Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems
    John A Keith
    Valentin Vassilev Galindo
    Bingqing Cheng
    Stefan Chmiela
    Michael Gastegger
    Alexandre Tkatchenko
    Chemical Reviews, 121 (16)(2021), 9816-9872, https://pubs.acs.org/doi/pdf/10.1021/acs.chemrev.1c00107
    Preview abstract Machine learning models are poised to make transformative impact in the chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry, machine learning methods, and how insights involving both can be achieved. We then follow with a critical review of noteworthy applications that demonstrate how computational quantum chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design. View details
    Machine Learning Force Fields
    Oliver Unke
    Stefan Chmiela
    Huziel Saucceda
    Michael Gastegger
    Igor Poltavsky
    Kristof T. Schütt
    Alexandre Tkatchenko
    Chemical Reviews, 121 (16)(2021), 10142-10186, https://pubs.acs.org/doi/pdf/10.1021/acs.chemrev.0c01111
    Preview abstract In recent years, the use of Machine Learning (ML) in computational chemistry has enabled numerous advances previously out of reach due to the computational complexity of traditional electronic-structure methods. One of the most promising applications is the construction of ML-based force fields (FFs), with the aim to narrow the gap between the accuracy of ab initio methods and the efficiency of classical FFs. The key idea is to learn the statistical relation between chemical structure and potential energy without relying on a preconceived notion of fixed chemical bonds or knowledge about the relevant interactions. Such universal ML approximations are in principle only limited by the quality and quantity of the reference data used to train them. This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them. The core concepts underlying ML-FFs are described in detail and a step-by-step guide for constructing and testing them from scratch is given. The text concludes with a discussion of the challenges that remain to be overcome by the next generation of MLFFs. View details
    A Unifying Review of Deep and Shallow Anomaly Detection
    Lukas Ruff
    Jacob Reinhard Kauffmann
    Robert Vandermeulen
    Gregoire Montavon
    Wojciech Samek
    Marius Kloft
    Thomas G. Dietterich
    Proc of the IEEE, 109(5)(2021), pp. 756-795 (to appear)
    Preview abstract Deep learning approaches to anomaly detection have recently improved the state of the art in detection performance on complex datasets such as large collections of images or text. These results have sparked a renewed interest in the anomaly detection problem and led to the introduction of a great variety of new methods. With the emergence of numerous such methods that include approaches based on generative models, one-class classification, and reconstruction, there is a growing need to bring methods of this field into a systematic and unified perspective. In this review, we therefore aim to identify the common underlying principles as well as the assumptions that are often made implicitly by various methods. In particular, we draw connections between classic ‘shallow’ and novel deep approaches and show how they exactly relate and moreover how this relation might cross-fertilize or extend both directions. We further provide an empirical assessment of major existing methods that is enriched by the use of recent explainability techniques, and present specific worked-through examples together with practical advice. Finally, we outline critical open challenges and identify specific paths for future research in anomaly detection. View details
    SE(3)-equivariant prediction of molecular wavefunctions and electronic densities
    Mihail Bogojeski
    Michael Gastegger
    Mario Geiger
    Tess Smidt
    Advances in Neural Information Processing Systems(2021)
    Preview abstract Machine learning has enabled the prediction of quantum chemical properties with high accuracy and efficiency, allowing to bypass computationally costly ab initio calculations. Instead of training on a fixed set of properties, more recent approaches attempt to learn the electronic wavefunction (or density) as a central quantity of atomistic systems, from which all other observables can be derived. This is complicated by the fact that wavefunctions transform non-trivially under molecular rotations, which makes them a challenging prediction target. To solve this issue, we introduce general SE(3)-equivariant operations and building blocks for constructing deep learning architectures for geometric point cloud data and apply them to reconstruct wavefunctions of atomistic systems with unprecedented accuracy. Our model reduces prediction errors by up to two orders of magnitude compared to the previous state-of-the-art and makes it possible to derive properties such as energies and forces directly from the wavefunction in an end-to-end manner. We demonstrate the potential of our approach in a transfer learning application, where a model trained on low accuracy reference wavefunctions implicitly learns to correct for electronic many-body interactions from observables computed at a higher level of theory. Such machine-learned wavefunction surrogates pave the way towards novel semi-empirical methods, offering resolution at an electronic level while drastically decreasing computational cost. While we focus on physics applications in this contribution, the proposed equivariant framework for deep learning on point clouds is promising also beyond, say, in computer vision or graphics. View details
    SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and Nonlocal Effects
    Stefan Chmiela
    Michael Gastegger
    Kristof T. Schütt
    Huziel Saucceda
    Nature Communications, 12(2021), pp. 7273
    Preview abstract Machine-learned force fields combine the accuracy of ab initio methods with the efficiency of conventional force fields. However, current machine-learned force fields typically ignore electronic degrees of freedom, such as the total charge or spin state, and assume chemical locality, which is problematic when molecules have inconsistent electronic states, or when nonlocal effects play a significant role. This work introduces SpookyNet, a deep neural network for constructing machine-learned force fields with explicit treatment of electronic degrees of freedom and nonlocality, modeled via self-attention in a transformer architecture. Chemically meaningful inductive biases and analytical corrections built into the network architecture allow it to properly model physical limits. SpookyNet improves upon the current state-of-the-art (or achieves similar performance) on popular quantum chemistry data sets. Notably, it is able to generalize across chemical and conformational space and can leverage the learned chemical insights, e.g. by predicting unknown spin states, thus helping to close a further important remaining gap for today’s machine learning models in quantum chemistry. View details
    Sensorimotor functional connectivity: a neurophysiological factor related to BCI performance
    Carmen Vidaurre
    Stefan Haufe
    Tania Jorajuría Gómez
    Vadim Nikulin
    Frontiers in Neuroscience, 14(2020), pp. 575081
    Preview abstract Brain-Computer Interfaces (BCIs) are systems that allow users to control devices using brain activity alone. However, the ability of participants to command BCIs varies from subject to subject. For BCIs based on the modulation of sensorimotor rhythms as measured by means of electroencephalography (EEG), about 20\% of potential users do not obtain enough accuracy to gain reliable control of the system. This lack of efficiency of BCI systems to decode user's intentions necessitates identification of neurophysiological factors determining `good' and `poor' BCI performers. Given that the neuronal oscillations, used in BCI, demonstrate rich a repertoire of spatial interactions, we hypothesized that neuronal activity in sensorimotor areas would define some aspects of BCI performance. Analyses for this study were performed on a large dataset of 80 inexperienced participants. They took part in calibration and an online feedback session in the same day. Undirected functional connectivity was computed over sensorimotor areas by means of the imaginary part of coherency. The results show that post- as well as pre-stimulus connectivity in the calibration recordings is significantly correlated to online feedback performance in $\mu$ and feedback frequency bands. Importantly, the significance of the correlation between connectivity and BCI feedback accuracy was not due to the signal-to-noise ratio of the oscillations in the corresponding post and pre-stimulus intervals. Thus, this study shows that BCI performance is not only dependent on the amplitude of sensorimotor oscillations as shown previously, but that it also relates to sensorimotor connectivity measured during the preceding training session. The presence of such connectivity between motor and somatosensory systems is likely to facilitate motor imagery, which in turn is associated with the generation of a more pronounced modulation of sensorimotor oscillations (manifested in ERD/ERS) required for the adequate BCI performance. We also discuss strategies for the up-regulation of such connectivity in order to enhance BCI performance. View details
    Molecular force fields with gradient-domain machine learning (GDML): Comparison and synergies with classical force fields
    Huziel Saucceda
    Michael Gastegger
    Stefan Chmiela
    Alexandre Tkatchenko
    Journal of Chemical Physics, 153(2020), pp. 124109
    Preview abstract The goal of the present work is to perform a detailed investigation of the differences between both systems based on a set of small molecules exhibiting different quantum mechanical phenomena. Based on these results, different alternatives are explored for improving the data generation process and their applicability context for expediting the force-field learning procedure. Furthermore, improvement of the accuracy for MM-FFs is studied by reparameterising them based on more accurate reference data and test their limits and functional form flexibility. For this task, we use the recently published sGDML framework[25, 26] as ML-FF of choice, as it is able to efficiently reconstruct the potential energy surfaces (PES) of medium sized molecules. The investigated systems are the molecules ethanol, the keto and enol forms of malondialdehyde (keto-MDA and enol-MDA, respectively) as well as salicylic and acetylsalicylic acid (Aspirin). In the context of these systems, we study the performance of MM-FFs and sGDML derived FFs based on the overall reliability of the generated PESs, as well as effects arising from chemical phenomena such as hydrogen transfer and orbital interactions. Although we restrict ourselves to the sGDML approach, it can nevertheless be expected that the results found here are equally valid for ML-FFs in general. View details
    Novel multivariate methods to track frequency shifts of neural oscillations in EEG/MEG recordings
    Carmen Vidaurre
    Kshipra Gurunandand
    Mina Jamshidi Idaji
    Guido Nolte
    Marisol Gómez
    Arno Villringer
    Vadim Nikulin
    Neuroimage, 276(2023), pp. 120178
    Preview abstract Instantaneous and peak frequency changes in neural oscillations have been linked to many perceptual, motor, and cognitive processes. Yet, the majority of such studies have been performed in sensor space and only occasionally in source space. Furthermore, both terms have been used interchangeably in the literature, although they do not reflect the same aspect of neural oscillations. In this paper, we discuss the relation between instantaneous frequency, peak frequency, and local frequency, the latter also known as spectral centroid. Furthermore, we propose and validate three different methods to extract source signals from multichannel data whose (instantaneous, local, or peak) frequency estimate is maximally correlated to an experimental variable of interest. Results show that the local frequency might be a better estimate of frequency variability than instantaneous frequency under conditions with low signal-to-noise ratio. Additionally, the source separation methods based on local and peak frequency estimates, called LFD and PFD respectively, provide more stable estimates than the decomposition based on instantaneous frequency. In particular, LFD and PFD are able to recover the sources of interest in simulations performed with a realistic head model, providing higher correlations with an experimental variable than multiple linear regression. Finally, we also tested all decomposition methods on real EEG data from a steady-state visual evoked potential paradigm and show that the recovered sources are located in areas similar to those previously reported in other studies, thus providing further validation of the proposed methods. View details
    Analysing Cerebrospinal Fluid with Explainable Deep Learning: from Diagnostics to Insights
    Leonille Schweizer
    Philipp Seegerer
    Hee‐yeong Kim
    René Saitenmacher
    Amos Muench
    Liane Barnick
    Anja Osterloh
    Carsten Dittmayer
    Ruben Jödicke
    Debora Pehl
    Annekathrin Reinhardt
    Klemens Ruprecht
    Werner Stenzel
    Annika K Wefers
    Patrick N Harter
    Ulrich Schüller
    Frank L Heppner
    Maximilian Alber
    Frederick Klauschen
    Neuropathology and Applied Neurobiology, 49(1)(2023), e12866
    Preview abstract Aim Analysis of cerebrospinal fluid (CSF) is essential for diagnostic workup of patients with neurological diseases and includes differential cell typing. The current gold standard is based on microscopic examination by specialised technicians and neuropathologists, which is time-consuming, labour-intensive and subjective. Methods We, therefore, developed an image analysis approach based on expert annotations of 123,181 digitised CSF objects from 78 patients corresponding to 15 clinically relevant categories and trained a multiclass convolutional neural network (CNN). Results The CNN classified the 15 categories with high accuracy (mean AUC 97.3%). By using explainable artificial intelligence (XAI), we demonstrate that the CNN identified meaningful cellular substructures in CSF cells recapitulating human pattern recognition. Based on the evaluation of 511 cells selected from 12 different CSF samples, we validated the CNN by comparing it with seven board-certified neuropathologists blinded for clinical information. Inter-rater agreement between the CNN and the ground truth was non-inferior (Krippendorff's alpha 0.79) compared with the agreement of seven human raters and the ground truth (mean Krippendorff's alpha 0.72, range 0.56–0.81). The CNN assigned the correct diagnostic label (inflammatory, haemorrhagic or neoplastic) in 10 out of 11 clinical samples, compared with 7–11 out of 11 by human raters. Conclusions Our approach provides the basis to overcome current limitations in automated cell classification for routine diagnostics and demonstrates how a visual explanation framework can connect machine decision-making with cell properties and thus provide a novel versatile and quantitative method for investigating CSF manifestations of various neurological diseases. View details
    Single-cell gene regulatory network prediction by explainable AI
    Philipp Keyl
    Philip Bischoff
    Gabriel Dernbach
    Michael Bockmayr
    Rebecca Fritz
    David Horst
    Nils Blüthgen
    Grégoire Montavon
    Frederick Klauschen
    Nucleic Acids Research(2023), gkac1212
    Preview abstract The molecular heterogeneity of cancer cells contributes to the often partial response to targeted therapies and relapse of disease due to the escape of resistant cell populations. While single-cell sequencing has started to improve our understanding of this heterogeneity, it offers a mostly descriptive view on cellular types and states. To obtain more functional insights, we propose scGeneRAI, an explainable deep learning approach that uses layer-wise relevance propagation (LRP) to infer gene regulatory networks from static single-cell RNA sequencing data for individual cells. We benchmark our method with synthetic data and apply it to single-cell RNA sequencing data of a cohort of human lung cancers. From the predicted single-cell networks our approach reveals characteristic network patterns for tumor cells and normal epithelial cells and identifies subnetworks that are observed only in (subgroups of) tumor cells of certain patients. While current state-of-the-art methods are limited by their ability to only predict average networks for cell populations, our approach facilitates the reconstruction of networks down to the level of single cells which can be utilized to characterize the heterogeneity of gene regulation within and across tumors. View details
    Patient-level proteomic network prediction by explainable artificial intelligence
    Philipp Keyl
    Michael Bockmayr
    Daniel Heim
    Gabriel Dernbach
    Grégoire Montavon
    Frederick Klauschen
    npj Precision Oncology, 6(2022), pp. 35
    Preview abstract Understanding the pathological properties of dysregulated protein networks in individual patients’ tumors is the basis for precision therapy. Functional experiments are commonly used, but cover only parts of the oncogenic signaling networks, whereas methods that reconstruct networks from omics data usually only predict average network features across tumors. Here, we show that the explainable AI method layer-wise relevance propagation (LRP) can infer protein interaction networks for individual patients from proteomic profiling data. LRP reconstructs average and individual interaction networks with an AUC of 0.99 and 0.93, respectively, and outperforms state-of-the-art network prediction methods for individual tumors. Using data from The Cancer Proteome Atlas, we identify known and potentially novel oncogenic network features, among which some are cancer-type specific and show only minor variation among patients, while others are present across certain tumor types but differ among individual patients. Our approach may therefore support predictive diagnostics in precision oncology by inferring “patient-level” oncogenic mechanisms. View details
    New definitions of human lymphoid and follicular cell entities in lymphatic tissue by machine learning
    Patrick Wagner
    Nils Strodthoff
    Patrick Wurzel
    Arturo Marban
    Sonja Scharf
    Hendrik Schäfer,
    Philipp Seegerer
    Andreas Loth
    Sylvia Hartmann
    Frederick Klauschen
    Wojciech Samek
    Martin-Leo Hansmann
    Scientific Reports, 12(2022), pp. 18991
    Preview abstract Histological sections of the lymphatic system are usually the basis of static (2D) morphological investigations. Here, we performed a dynamic (4D) analysis of human reactive lymphoid tissue using confocal fluorescent laser microscopy in combination with machine learning. Based on tracks for T-cells (CD3), B-cells (CD20), follicular T-helper cells (PD1) and optical flow of follicular dendritic cells (CD35), we put forward the first quantitative analysis of movement-related and morphological parameters within human lymphoid tissue. We identified correlations of follicular dendritic cell movement and the behavior of lymphocytes in the microenvironment. In addition, we investigated the value of movement and/or morphological parameters for a precise definition of cell types (CD clusters). CD-clusters could be determined based on movement and/or morphology. Differentiating between CD3- and CD20 positive cells is most challenging and long term-movement characteristics are indispensable. We propose morphological and movement-related prototypes of cell entities applying machine learning models. Finally, we define beyond CD clusters new subgroups within lymphocyte entities based on long term movement characteristics. In conclusion, we showed that the combination of 4D imaging and machine learning is able to define characteristics of lymphocytes not visible in 2D histology. View details
    2020 International brain–computer interface competition: A review
    Ji-Hoon Jeong
    Jeong-Hyun Cho
    Young-Eun Lee
    Seo-Hyun Lee
    Gi-Hwan Shin
    Young-Seok Kweon
    José del R Millán
    Seong-Whan Lee
    Frontiers in Human Neuroscience, 16(2022), pp. 898300
    Preview abstract The brain-computer interface (BCI) has been investigated as a form of communication tool between the brain and external devices. BCIs have been extended beyond communication and control over the years. The 2020 international BCI competition aimed to provide high-quality neuroscientific data for open access that could be used to evaluate the current degree of technical advances in BCI. Although there are a variety of remaining challenges for future BCI advances, we discuss some of more recent application directions: (i) few-shot EEG learning, (ii) micro-sleep detection (iii) imagined speech decoding, (iv) cross-session classification, and (v) EEG(+ear-EEG) detection in an ambulatory environment. Not only did scientists from the BCI field compete, but scholars with a broad variety of backgrounds and nationalities participated in the competition to address these challenges. Each dataset was prepared and separated into three data that were released to the competitors in the form of training and validation sets followed by a test set. Remarkable BCI advances were identified through the 2020 competition and indicated some trends of interest to BCI researchers. View details
    Inverse design of 3d molecular structures with conditional generative neural networks
    Niklas W. A. Gebauer
    Michael Gastegger
    Stefaan S. P. Hessmann
    Kristof T. Schütt
    Nature Communications, 13(2022), pp. 973
    Preview abstract The rational design of molecules with desired properties is a long-standing challenge in chemistry. Generative neural networks have emerged as a powerful approach to sample novel molecules from a learned distribution. Here, we propose a conditional generative neural network for 3d molecular structures with specified chemical and structural properties. This approach is agnostic to chemical bonding and enables targeted sampling of novel molecules from conditional distributions, even in domains where reference calculations are sparse. We demonstrate the utility of our method for inverse design by generating molecules with specified motifs or composition, discovering particularly stable molecules, and jointly targeting multiple electronic properties beyond the training regime. View details
    Machine learning models predict the primary sites of head and neck squamous cell carcinoma metastases based on DNA methylation
    Maximilian Leitheiser
    David Capper
    Philipp Seegerer
    Annika Lehmann
    Ulrich Schüller
    Frederick Klauschen
    Philipp Jurmeister
    Michael Bockmayr
    The Journal of Pathology, 254(4)(2022), pp. 378-387
    Preview abstract In head and neck squamous cell cancers (HNSCs) that present as metastases with an unknown primary (HNSC-CUPs), the identification of a primary tumor improves therapy options and increases patient survival. However, the currently available diagnostic methods are laborious and do not offer a sufficient detection rate. Predictive machine learning models based on DNA methylation profiles have recently emerged as a promising technique for tumor classification. We applied this technique to HNSC to develop a tool that can improve the diagnostic work-up for HNSC-CUPs. On a reference cohort of 405 primary HNSC samples, we developed four classifiers based on different machine learning models [random forest (RF), neural network (NN), elastic net penalized logistic regression (LOGREG), and support vector machine (SVM)] that predict the primary site of HNSC tumors from their DNA methylation profile. The classifiers achieved high classification accuracies (RF = 83%, NN = 88%, LOGREG = SVM = 89%) on an independent cohort of 64 HNSC metastases. Further, the NN, LOGREG, and SVM models significantly outperformed p16 status as a marker for an origin in the oropharynx. In conclusion, the DNA methylation profiles of HNSC metastases are characteristic for their primary sites, and the classifiers developed in this study, which are made available to the scientific community, can provide valuable information to guide the diagnostic work-up of HNSC-CUP. View details
    Building and Interpreting Deep Similarity Models
    Oliver Eberle
    Jochen Büttner
    Florian Kräutli
    Matteo Valleriani
    Gregoire Montavon
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3)(2022), pp. 1149-1161
    Preview abstract Many learning algorithms such as kernel machines, nearest neighbors, clustering, or anomaly detection, are based on distances or similarities. Before similarities are used for training an actual machine learning model, we would like to verify that they are bound to meaningful patterns in the data. In this paper, we propose to make similarities interpretable by augmenting them with an explanation. We develop BiLRP, a scalable and theoretically founded method to systematically decompose the output of an already trained deep similarity model on pairs of input features. Our method can be expressed as a composition of LRP explanations, which were shown in previous works to scale to highly nonlinear models. Through an extensive set of experiments, we demonstrate that BiLRP robustly explains complex similarity models, e.g. built on VGG-16 deep neural network features. Additionally, we apply our method to an open problem in digital humanities: detailed assessment of similarity between historical documents such as astronomical tables. Here again, BiLRP provides insight and brings verifiability into a highly engineered and problem-specific similarity model. View details
    Finding and removing Clever Hans: Using explanation methods to debug and improve deep models
    Christopher J. Anders
    Leander Weber
    David Neumann
    Wojciech Samek
    Sebastian Lapuschkin
    Information Fusion, 77(2022), pp. 261-295
    Preview abstract Contemporary learning models for computer vision are typically trained on very large (benchmark) datasets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploitable by the model. In the worst case, the trained model does not learn a valid and generalizable strategy to solve the problem it was trained for, and becomes a “Clever Hans” predictor that bases its decisions on spurious correlations in the training data, potentially yielding an unrepresentative or unfair, and possibly even hazardous predictor. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora. Based on a recent technique — Spectral Relevance Analysis — we propose the following technical contributions and resulting findings: (a) a scalable quantification of artifactual and poisoned classes where the machine learning models under study exhibit Clever Hans behavior, (b) several approaches we collectively denote as Class Artifact Compensation, which are able to effectively and significantly reduce a model’s Clever Hans behavior, i.e., we are able to un-Hans models trained on (poisoned) datasets, such as the popular ImageNet data corpus. We demonstrate that Class Artifact Compensation, defined in a simple theoretical framework, may be implemented as part of a neural network’s training or fine-tuning process, or in a post-hoc manner by injecting additional layers, preventing any further propagation of undesired Clever Hans features, into the network architecture. Using our proposed methods, we provide qualitative and quantitative analyses of the biases and artifacts in, e.g., the ImageNet dataset, the Adience benchmark dataset of unfiltered faces, and the ISIC 2019 skin lesion analysis dataset. We demonstrate that these insights can give rise to improved, more representative, and fairer models operating on implicitly cleaned data corpora. View details
    DNA methylation-based classification of sinonasal tumors
    Philipp Jurmeister
    Stefanie Glöß
    Renée Roller
    Maximilian Leitheiser
    Simone Schmid
    Liliana H Mochmann
    Emma Payá Capilla
    Rebecca Fritz
    Carsten Dittmayer
    Corinna Friedrich
    Anne Thieme
    Philipp Keyl
    Armin Jarosch
    Simon Schallenberg
    Hendrik Bläker
    Inga Hoffmann
    Claudia Vollbrecht
    Annika Lehmann
    Michael Hummel
    Daniel Heim
    Mohamed Haji
    Patrick Harter
    Benjamin Englert
    Stephan Frank
    Jürgen Hench
    Werner Paulus
    Martin Hasselblatt
    Wolfgang Hartmann
    Hildegard Dohmen
    Ursula Keber
    Paul Jank
    Carsten Denkert
    Christine Stadelmann
    Felix Bremmer
    Annika Richter
    Annika Wefers
    Julika Ribbat-Idel
    Sven Perner
    Christian Idel
    Lorenzo Chiariotti
    Rosa Della Monica
    Alfredo Marinelli
    Ulrich Schüller
    Michael Bockmayr
    Jacklyn Liu
    Valerie J Lund
    Martin Forster
    Matt Lechner
    Sara L Lorenzo-Guerra
    Mario Hermsen
    Pascal D Johann
    Abbas Agaimy
    Philipp Seegerer
    Arend Koch
    Frank Heppner
    Stefan M Pfister
    David TW Jones
    Martin Sill
    Andreas von Deimling
    Matija Snuderl
    Erna Forgó
    Brooke E. Howitt
    Philipp Mertins
    Frederick Klauschen
    David Capper
    Nature Communications, 13(2022), pp. 7148
    Preview abstract The diagnosis of sinonasal tumors is challenging due to a heterogeneous spectrum of various differential diagnoses as well as poorly defined, disputed entities such as sinonasal undifferentiated carcinomas (SNUCs). In this study, we apply a machine learning algorithm based on DNA methylation patterns to classify sinonasal tumors with clinical-grade reliability. We further show that sinonasal tumors with SNUC morphology are not as undifferentiated as their current terminology suggests but rather reassigned to four distinct molecular classes defined by epigenetic, mutational and proteomic profiles. This includes two classes with neuroendocrine differentiation, characterized by IDH2 or SMARCA4/ARID1A mutations with an overall favorable clinical course, one class composed of highly aggressive SMARCB1-deficient carcinomas and another class with tumors that represent potentially previously misclassified adenoid cystic carcinomas. Our findings can aid in improving the diagnostic classification of sinonasal tumors and could help to change the current perception of SNUCs. View details
    To pretrain or not? A systematic analysis of the benefits of pretraining in diabetic retinopathy
    Vignesh Srinivasan
    Nils Strodthoff
    Jackie Ma
    Alexander Binder
    Wojciech Samek
    Plos one, 17(10)(2022), e0274291
    Preview abstract There is an increasing number of medical use cases where classification algorithms based on deep neural networks reach performance levels that are competitive with human medical experts. To alleviate the challenges of small dataset sizes, these systems often rely on pretraining. In this work, we aim to assess the broader implications of these approaches in order to better understand what type of pretraining works reliably (with respect to performance, robustness, learned representation etc.) in practice and what type of pretraining dataset is best suited to achieve good performance in small target dataset size scenarios. Considering diabetic retinopathy grading as an exemplary use case, we compare the impact of different training procedures including recently established self-supervised pretraining methods based on contrastive learning. To this end, we investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models initialized from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions. In particular, self-supervised models show further benefits to supervised models. Self-supervised models with initialization from ImageNet pretraining not only report higher performance, they also reduce overfitting to large lesions along with improvements in taking into account minute lesions indicative of the progression of the disease. Understanding the effects of pretraining in a broader sense that goes beyond simple performance comparisons is of crucial importance for the broader medical imaging community beyond the use case considered in this work. View details
    Towards the interpretability of deep learning models for multi-modal neuroimaging: Finding structural changes of the ageing brain
    Simon M Hofmann
    Frauke Beyer
    Sebastian Lapuschkin
    Ole Goltermann
    Markus Loeffler
    Arno Villringer
    Wojciech Samek
    A Veronica Witte
    Neuroimage, 261(2022), pp. 119504
    Preview abstract Brain-age (BA) estimates based on deep learning are increasingly used as neuroimaging biomarker for brain health; however, the underlying neural features have remained unclear. We combined ensembles of convolutional neural networks with Layer-wise Relevance Propagation (LRP) to detect which brain features contribute to BA. Trained on magnetic resonance imaging (MRI) data of a population-based study (n = 2637, 18–82 years), our models estimated age accurately based on single and multiple modalities, regionally restricted and whole-brain images (mean absolute errors 3.37–3.86 years). We find that BA estimates capture ageing at both small and large-scale changes, revealing gross enlargements of ventricles and subarachnoid spaces, as well as white matter lesions, and atrophies that appear throughout the brain. Divergence from expected ageing reflected cardiovascular risk factors and accelerated ageing was more pronounced in the frontal lobe. Applying LRP, our study demonstrates how superior deep learning models detect brain-ageing in healthy and at-risk individuals throughout adulthood. View details
    Forecasting industrial aging processes with machine learning methods
    Mihail Bogojeski
    Simeon Sauer
    Franziska Horn
    Computers & Chemical Engineering, 144(2021), pp. 107123
    Preview abstract Accurately predicting industrial aging processes makes it possible to schedule maintenance events further in advance, ensuring a cost-efficient and reliable operation of the plant. So far, these degradation processes were usually described by mechanistic or simple empirical prediction models. In this paper, we evaluate a wider range of data-driven models, comparing some traditional stateless models (linear and kernel ridge regression, feed-forward neural networks) to more complex recurrent neural networks (echo state networks and LSTMs). We first examine how much historical data is needed to train each of the models on a synthetic dataset with known dynamics. Next, the models are tested on real-world data from a large scale chemical plant. Our results show that recurrent models produce near perfect predictions when trained on larger datasets, and maintain a good performance even when trained on smaller datasets with domain shifts, while the simpler models only performed comparably on the smaller datasets. View details
    Unification of sparse Bayesian learning algorithms for electromagnetic brain imaging with the majorization minimization framework
    Ali Hashemi
    Chang Cai
    Gitta Kutyniok
    Srikantan S.Nagarajan
    StefanHaufe
    Neuroimage, https://doi.org/10.1016/j.neuroimage.2021.118309(2021)
    Preview abstract Methods for electro- or magnetoencephalography (EEG/MEG) based brain source imaging (BSI) using sparse Bayesian learning (SBL) have been demonstrated to achieve excellent performance in situations with low numbers of distinct active sources, such as event-related designs. This paper extends the theory and practice of SBL in three important ways. First, we reformulate three existing SBL algorithms under the majorization-minimization (MM) framework. This unification perspective not only provides a useful theoretical framework for comparing different algorithms in terms of their convergence behavior, but also provides a principled recipe for constructing novel algorithms with specific properties by designing appropriate bounds of the Bayesian marginal likelihood function. Second, building on the MM principle, we propose a novel method called LowSNR-BSI that achieves favorable source reconstruction performance in low signal-to-noise-ratio (SNR) settings. Third, precise knowledge of the noise level is a crucial requirement for accurate source reconstruction. Here we present a novel principled technique to accurately learn the noise variance from the data either jointly within the source reconstruction procedure or using one of two proposed cross-validation strategies. Empirically, we could show that the monotonous convergence behavior predicted from MM theory is confirmed in numerical experiments. Using simulations, we further demonstrate the advantage of LowSNR-BSI over conventional SBL in low-SNR regimes, and the advantage of learned noise levels over estimates derived from baseline data. To demonstrate the usefulness of our novel approach, we show neurophysiologically plausible source reconstructions on averaged auditory evoked potential data. View details
    Pruning by explaining: A novel criterion for deep neural network pruning
    Seul-Ki Yeom
    Philipp Seegerer
    Sebastian Lapuschkin
    Alexander Binder
    Simon Wiedemann
    Wojciech Samek
    Pattern Recognition, 115(2021), pp. 107899
    Preview abstract The success of convolutional neural networks (CNNs) in various applications is accompanied by a significant increase in computation and parameter storage costs. Recent efforts to reduce these overheads involve pruning and compressing the weights of various layers while at the same time aiming to not sacrifice performance. In this paper, we propose a novel criterion for CNN pruning inspired by neural network interpretability: The most relevant units, i.e. weights or filters, are automatically found using their relevance scores obtained from concepts of explainable AI (XAI). By exploring this idea, we connect the lines of interpretability and model compression research. We show that our proposed method can efficiently prune CNN models in transfer-learning setups in which networks pre-trained on large corpora are adapted to specialized tasks. The method is evaluated on a broad range of computer vision datasets. Notably, our novel criterion is not only competitive or better compared to state-of-the-art pruning criteria when successive retraining is performed, but clearly outperforms these previous criteria in the resource-constrained application scenario in which the data of the task to be transferred to is very scarce and one chooses to refrain from fine-tuning. Our method is able to compress the model iteratively while maintaining or even improving accuracy. At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning. View details
    Morphological and molecular breast cancer profiling through explainable machine learning
    Alexander Binder
    Michael Bockmayr
    Miriam Hägele
    Stephan Wienert
    Daniel Heim
    Katharina Hellweg
    Masaru Ishii
    Albrecht Stenzinger
    Andreas Hocke
    Carsten Denkert
    Frederick Klauschen
    Nature Machine Intelligence, 3(2021), 355–366
    Preview abstract Recent advances in cancer research and diagnostics largely rely on new developments in microscopic or molecular profiling techniques, offering high levels of detail with respect to either spatial or molecular features, but usually not both. Here, we present an explainable machine-learning approach for the integrated profiling of morphological, molecular and clinical features from breast cancer histology. First, our approach allows for the robust detection of cancer cells and tumour-infiltrating lymphocytes in histological images, providing precise heatmap visualizations explaining the classifier decisions. Second, molecular features, including DNA methylation, gene expression, copy number variations, somatic mutations and proteins are predicted from histology. Molecular predictions reach balanced accuracies up to 78%, whereas accuracies of over 95% can be achieved for subgroups of patients. Finally, our explainable AI approach allows assessment of the link between morphological and molecular cancer properties. The resulting computational multiplex-histology analysis can help promote basic cancer research and precision medicine through an integrated diagnostic scoring of histological, clinical and molecular features. View details
    Robustifying models against adversarial attacks by Langevin dynamics
    Vignesh Srinivasan
    Csaba Rohrer
    Arturo Marban
    Wojciech Samek
    Shinichi Nakajima
    Neural Networks, 137(2021), pp. 1-17
    Preview abstract Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a number of defense methods were proposed, which however, have been circumvented by newer and more sophisticated attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains a challenging task. This paper proposes a novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. To achieve this task, we introduce a generative model of the conditional distribution of the inputs given labels that can be learned through a supervised Denoising Autoencoder (sDAE) in alignment with a discriminative classifier. Our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion—projection is distributed broadly. This prevents white box attacks from accurately aligning the input to create an adversarial sample effectively. MALADE is applicable to any existing classifier, providing robust defense as well as off-manifold sample detection. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies. View details
    Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
    Felix Sattler
    Wojciech Samek
    IEEE Transactions on Neural Networks and Learning Systems, 32(8)(2021), pp. 3710-3722
    Preview abstract Federated learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit its popularity, it has been observed that FL yields suboptimal results if the local clients’ data distributions diverge. To address this issue, we present clustered FL (CFL), a novel federated multitask learning (FMTL) framework, which exploits geometric properties of the FL loss surface to group the client population into clusters with jointly trainable data distributions. In contrast to existing FMTL approaches, CFL does not require any modifications to the FL communication protocol to be made, is applicable to general nonconvex objectives (in particular, deep neural networks), does not require the number of clusters to be known a priori, and comes with strong mathematical guarantees on the clustering quality. CFL is flexible enough to handle client populations that vary over time and can be implemented in a privacy-preserving way. As clustering is only performed after FL has converged to a stationary point, CFL can be viewed as a postprocessing method that will always achieve greater or equal performance than conventional FL by allowing clients to arrive at more specialized models. We verify our theoretical analysis in experiments with deep convolutional and recurrent neural networks on commonly used FL data sets. View details
    Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications
    Wojciech Samek
    Gregoire Montavon
    Sebastian Lapuschkin
    Christopher J. Anders
    Proc of the IEEE, 109(3)(2021), pp. 247-278
    Preview abstract With the broader and highly successful usage of machine learning (ML) in industry and the sciences, there has been a growing demand for explainable artificial intelligence (XAI). Interpretability and explanation methods for gaining a better understanding of the problem-solving abilities and strategies of nonlinear ML, in particular, deep neural networks, are, therefore, receiving increased attention. In this work, we aim to: 1) provide a timely overview of this active emerging field, with a focus on “ post hoc ” explanations, and explain its theoretical foundations; 2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations; 3) outline best practice aspects, i.e., how to best include interpretation methods into the standard usage of ML; and 4) demonstrate successful usage of XAI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of ML. View details
    Unification of sparse Bayesian learning algorithms for electromagnetic brain imaging with the majorization minimization framework
    Ali Hashemi
    Chang Cai
    Gitta Kutyniok
    Srikantan S Nagarajan
    Stefan Haufe
    Neuroimage, 239(2021), pp. 118309
    Preview abstract Methods for electro- or magnetoencephalography (EEG/MEG) based brain source imaging (BSI) using sparse Bayesian learning (SBL) have been demonstrated to achieve excellent performance in situations with low numbers of distinct active sources, such as event-related designs. This paper extends the theory and practice of SBL in three important ways. First, we reformulate three existing SBL algorithms under the majorization-minimization (MM) framework. This unification perspective not only provides a useful theoretical framework for comparing different algorithms in terms of their convergence behavior, but also provides a principled recipe for constructing novel algorithms with specific properties by designing appropriate bounds of the Bayesian marginal likelihood function. Second, building on the MM principle, we propose a novel method called LowSNR-BSI that achieves favorable source reconstruction performance in low signal-to-noise-ratio (SNR) settings. Third, precise knowledge of the noise level is a crucial requirement for accurate source reconstruction. Here we present a novel principled technique to accurately learn the noise variance from the data either jointly within the source reconstruction procedure or using one of two proposed cross-validation strategies. Empirically, we could show that the monotonous convergence behavior predicted from MM theory is confirmed in numerical experiments. Using simulations, we further demonstrate the advantage of LowSNR-BSI over conventional SBL in low-SNR regimes, and the advantage of learned noise levels over estimates derived from baseline data. To demonstrate the usefulness of our novel approach, we show neurophysiologically plausible source reconstructions on averaged auditory evoked potential data. View details
    Immediate brain plasticity after one hour of brain–computer interface (BCI)
    Till Nierhaus
    Carmen Vidaurre
    Claudia Sannelli
    Arno Villringer
    The Journal of Physiology, 599(9)(2021), pp. 2435-2451
    Preview abstract A brain-computer-interface (BCI) allows humans to control computational devices using only neural signals. However, it is still an open question, whether performing BCI also impacts on the brain itself, i.e. whether brain plasticity is induced. Here, we show rapid and spatially specific signs of brain plasticity measured with functional and structural MRI after only 1 h of purely mental BCI training in BCI-naive subjects. We employed two BCI approaches with neurofeedback based on (i) modulations of EEG rhythms by motor imagery (MI-BCI) or (ii) event-related potentials elicited by visually targeting flashing letters (ERP-BCI). Before and after the BCI session we performed structural and functional MRI. For both BCI approaches we found increased T1-weighted MR signal in the grey matter of the respective target brain regions, such as occipital/parietal areas after ERP-BCI and precuneus and sensorimotor regions. View details
    Machine learning of solvent effects on molecular spectra and reactions
    Michael Gastegger
    Kristof T. Schütt
    Chemical Science(2021), http://dx.doi.org/10.1039/D1SC02742E
    Preview abstract Fast and accurate simulation of complex chemical systems in environments such as solutions is a long standing challenge in theoretical chemistry. In recent years{,} machine learning has extended the boundaries of quantum chemistry by providing highly accurate and efficient surrogate models of electronic structure theory{,} which previously have been out of reach for conventional approaches. Those models have long been restricted to closed molecular systems without accounting for environmental influences, such as external electric and magnetic fields or solvent effects. Here, we introduce the deep neural network FieldSchNet for modeling the interaction of molecules with arbitrary external fields. FieldSchNet offers access to a wealth of molecular response properties, enabling it to simulate a wide range of molecular spectra, such as infrared, Raman and nuclear magnetic resonance. Beyond that, it is able to describe implicit and explicit molecular environments, operating as a polarizable continuum model for solvation or in a quantum mechanics/molecular mechanics setup. We employ FieldSchNet to study the influence of solvent effects on molecular spectra and a Claisen rearrangement reaction. Based on these results, we use FieldSchNet to design an external environment capable of lowering the activation barrier of the rearrangement reaction significantly, demonstrating promising venues for inverse chemical design. View details
    Leaf-inspired homeostatic cellulose biosensors
    Ji-Yong Kim
    Yong Ju Yun
    Joshua Jeong
    C.-Yoon Kim
    Seong-Whan Lee
    Science Advances, 7(16)(2021), eabe7432
    Preview abstract An incompatibility between skin homeostasis and existing biosensor interfaces inhibits long-term electrophysiological signal measurement. Inspired by the leaf homeostasis system, we developed the first homeostatic cellulose biosensor with functions of protection, sensation, self-regulation, and biosafety. Moreover, we find that a mesoporous cellulose membrane transforms into homeostatic material with properties that include high ion conductivity, excellent flexibility and stability, appropriate adhesion force, and self-healing effects when swollen in a saline solution. The proposed biosensor is found to maintain a stable skin-sensor interface through homeostasis even when challenged by various stresses, such as a dynamic environment, severe detachment, dense hair, sweat, and long-term measurement. Last, we demonstrate the high usability of our homeostatic biosensor for continuous and stable measurement of electrophysiological signals and give a showcase application in the field of brain-computer interfacing where the biosensors and machine learning together help to control real-time applications beyond the laboratory at unprecedented versatility. View details
    Autonomous robotic nanofabrication with reinforcement learning
    Philipp Leinen
    Malte Esders
    Kristof T. Schütt
    Christian Wagner
    F. Stefan Tautz
    Science Advances, 6 (36)(2020), eabb6987
    Preview abstract The ability to handle single molecules as effectively as macroscopic building blocks would enable the construction of complex supramolecular structures inaccessible to self-assembly. The fundamental challenges obstructing this goal are the uncontrolled variability and poor observability of atomic-scale conformations. Here, we present a strategy to work around both obstacles and demonstrate autonomous robotic nanofabrication by manipulating single molecules. Our approach uses reinforcement learning (RL), which finds solution strategies even in the face of large uncertainty and sparse feedback. We demonstrate the potential of our RL approach by removing molecules autonomously with a scanning probe microscope from a supramolecular structure. Our RL agent reaches an excellent performance, enabling us to automate a task that previously had to be performed by a human. We anticipate that our work opens the way toward autonomous agents for the robotic construction of functional supramolecular structures with speed, precision, and perseverance beyond our current capabilities. View details
    Ensemble learning of coarse-grained molecular dynamics force fields with a kernel approach
    Jiang Wang
    Stefan Chmiela
    Frank Noé
    Cecilia Clementi
    The Journal of Chemical Physics, 152(2020), pp. 194106
    Preview abstract Gradient-domain machine learning (GDML) is an accurate and efficient approach to learn a molecular potential and associated force field based on the kernel ridge regression algorithm. Here, we demonstrate its application to learn an effective coarse-grained (CG) model from all-atom simulation data in a sample efficient manner. The CG force field is learned by following the thermodynamic consistency principle, here by minimizing the error between the predicted CG force and the all-atom mean force in the CG coordinates. Solving this problem by GDML directly is impossible because coarse-graining requires averaging over many training data points, resulting in impractical memory requirements for storing the kernel matrices. In this work, we propose a data-efficient and memory-saving alternative. Using ensemble learning and stratified sampling, we propose a 2-layer training scheme that enables GDML to learn an effective CG model. We illustrate our method on a simple biomolecular system, alanine dipeptide, by reconstructing the free energy landscape of a CG variant of this molecule. Our novel GDML training scheme yields a smaller free energy error than neural networks when the training set is small, and a comparably high accuracy when the training set is sufficiently large. View details
    Fairwashing Explanations with Off-Manifold Detergent
    Christopher J. Anders
    Plamen Pasliev
    Ann-Kathrin Dombrowski
    Pan Kessel
    International Conference on Machine Learning, PMLR(2020), pp. 314-323
    Preview abstract Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier g, one can always construct another classifier g˜ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly more robust. View details
    Enhanced Performance of a Brain Switch by Simultaneous Use of EEG and NIRS Data for Asynchronous Brain-Computer Interface
    Chang-Hee Han
    Han-Jeong Hwang
    IEEE Transactions on Neural Systems and Rehabilitation Engineering, 28(10)(2020), pp. 2102-2112
    Preview abstract Previous studies have shown the superior performance of hybrid electroencephalography (EEG)/ near-infrared spectroscopy (NIRS) brain-computer interfaces (BCIs). However, it has been veiled whether the use of a hybrid EEG/NIRS modality can provide better performance for a brain switch that can detect the onset of the intention to turn on a BCI. In this study, we developed such a hybrid EEG/NIRS brain switch and compared its performance with single modality EEG- and NIRS-based brain switch respectively, in terms of true positive rate (TPR), false positive rate (FPR), onset detection time (ODT), and information transfer rate (ITR). In an offline analysis, the performance of a hybrid EEG/NIRS brain switch was significantly improved over that of EEG- and NIRS-based brain switches in general, and in particular a significantly lower FPR was observed for the hybrid EEG/NIRS brain switch. A pseudo-online analysis was additionally performed to confirm the feasibility of implementing an online BCI system with our hybrid EEG/NIRS brain switch. The overall trend of pseudo-online analysis results generally coincided with that of the offline analysis results. No significant difference in all performance measures was also found between offline and pseudo online analysis schemes when the amount of training data was same, with one exception for the ITRs of an EEG brain switch. These offline and pseudo-online results demonstrate that a hybrid EEG/NIRS brain switch can be used to provide a better onset detection performance than that of a single neuroimaging modality. View details
    An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions
    Dong-Ok Won
    Seong-Whan Lee
    Science Robotics, 5 (46)(2020), eabb9764
    Preview abstract The game of curling can be considered a good test bed for studying the interaction between artificial intelligence systems and the real world. In curling, the environmental characteristics change at every moment, and every throw has an impact on the outcome of the match. Furthermore, there is no time for relearning during a curling match due to the timing rules of the game. Here, we report a curling robot that can achieve human-level performance in the game of curling using an adaptive deep reinforcement learning framework. Our proposed adaptation framework extends standard deep reinforcement learning using temporal features, which learn to compensate for the uncertainties and nonstationarities that are an unavoidable part of curling. Our curling robot, Curly, was able to win three of four official matches against expert human teams [top-ranked women’s curling teams and Korea national wheelchair curling team (reserve team)]. These results indicate that the gap between physics-based simulators and the real world can be narrowed. View details
    Exploring chemical compound space with quantum-based machine learning
    O. Anatole von Lilienfeld
    Alexandre Tkatchenko
    Nature Reviews Chemistry, 4(2020), 347–358
    Preview abstract Rational design of compounds with specific properties requires understanding and fast evaluation of molecular properties throughout chemical compound space — the huge set of all potentially stable molecules. Recent advances in combining quantum-mechanical calculations with machine learning provide powerful tools for exploring wide swathes of chemical compound space. We present our perspective on this exciting and quickly developing field by discussing key advances in the development and applications of quantum-mechanics-based machine-learning methods to diverse compounds and properties, and outlining the challenges ahead. We argue that significant progress in the exploration and understanding of chemical compound space can be made through a systematic combination of rigorous physical theories, comprehensive synthetic data sets of microscopic and macroscopic properties, and modern machine-learning methods that account for physical and chemical knowledge. View details
    Quantum chemical accuracy from density functional approximations via machine learning
    Mihail Bogojeski
    Leslie Vogt-Maranto
    Mark E. Tuckerman
    Kieron Burke
    Nature Communications, 11(2020), pp. 5223
    Preview abstract Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol−1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol−1 ) on test data. Moreover, densitybased Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ-DFT is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails. View details
    Machine Learning Meets Quantum Physics
    Kristof T. Schütt
    Stefan Chmiela
    O. Anatole von Lilienfeld
    Alexandre Tkatchenko
    Koji Tsuda
    Springer(2020)
    Preview abstract A Book that connects the fields of Machine Learning and Quantum Chemistry . View details