Viren Jain

Viren Jain

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    CURIE: Evaluating LLMs on multitask long context scientific understanding and reasoning
    Matthew Abraham
    Haining Pan
    Zahra Shamsi
    Muqthar Mohammad
    Chenfei Jiang
    Ruth Alcantara
    Gowoon Cheon
    Xuejian Ma
    Michael Statt
    Jackson Cui
    Nayantara Mudur
    Eun-Ah Kim
    Paul Raccuglia
    Victor V. Albert
    Lizzie Dorfman
    Brian Rohr
    Shutong Li
    Maria Tikhanovskaya
    Drew Purves
    Elise Kleeman
    Philippe Faist
    Ean Phing VanLee
    International Conference on Learning Representations (ICLR) (2025)
    Preview abstract The core of the scientific problem-solving process involves synthesizing information while applying expert knowledge. Large Language Models (LLMs) have the potential to accelerate this process due to their extensive knowledge across a variety of domains. Recent advancements have also made it possible for LLMs to handle very long "in-context" content. However, existing evaluations of long-context LLMs have focused on assessing their ability to summarize or retrieve information within the given context, primarily in generalist tasks that do not require deep scientific expertise. To facilitate analogous assessments of domain-specific tasks, we introduce the scientific long-Context Understanding and Reasoning Inference Evaluations (CURIE) benchmark. This benchmark provides a set of 8 challenging tasks, derived from around 250 scientific research papers, requiring domain expertise, comprehension of long in-context information, and multi-step reasoning that tests the ability of LLMs to assist scientists in realistic workflows. Tasks in CURIE have been collected from experts in six disciplines - materials science, theoretical condensed matter physics, quantum computing, geospatial analysis, biodiversity, and protein sequencing - covering both experimental and theoretical workflows in science. We evaluate a range of closed and open LLMs on these tasks. Additionally, we propose strategies for task decomposition, which allow for a more nuanced evaluation of the models and facilitate staged multi-step assessments. We hope that insights gained from CURIE can guide the future development of LLMs. View details
    Light-microscopy-based dense connectomic reconstruction of mammalian brain tissue
    Mojtaba R. Tavakoli
    Julia Lyudchik
    Vitali Vistunou
    Nathalie Agudelo Duenas
    Jakob Vorlaufer
    Christoph Sommer
    Caroline Kreuzinger
    Barbara de Souza Oliveira
    Alban Cenameri
    Gaia Novarino
    Johann Danzl
    Nature (2025)
    Preview abstract The information-processing capability of the brain’s cellular network depends on the physical wiring pattern between neurons and their molecular and functional characteristics. Charting neurons and resolving the individual synaptic connections requires volumetric imaging at nanoscale resolution and comprehensive cellular contrast. Light microscopy is uniquely positioned to visualize specific molecules but dense, synapse-level circuit reconstruction by light microscopy has been out of reach due to limitations in resolution, contrast, and volumetric imaging capability. Here we developed light-microscopy based connectomics (LICONN). We integrated hydrogel embedding and expansion with comprehensive deep-learning based segmentation and analysis of connectivity, thus directly incorporating molecular information in synapse-level brain tissue reconstructions. LICONN will allow synapse-level brain tissue phenotyping in biological experiments in a readily adoptable manner. View details
    ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
    Alexander Immer
    Alex Bo-Yuan Chen
    Mariela D. Petkova
    Nirmala A. Iyer
    Luuk Willem Hesselink
    Aparna Dev
    Gudrun Ihrke
    Woohyun Park
    Alyson Petruncio
    Aubrey Weigel
    Wyatt Korff
    Florian Engert
    Jeff W. Lichtman
    Misha B. Ahrens
    International Conference on Learning Representations (ICLR) (2025)
    Preview abstract Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we present the Zebrafish Activity Prediction Benchmark (ZAPBench), which quantitatively measures progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of more than 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into ZAP forecasting methods. View details
    ZAPBench: a benchmark for whole-brain activity prediction in zebrafish
    Alex Immer
    Alex Bo-Yuan Chen
    Mariela Petkova
    Nirmala Iyer
    Luuk Hesselink
    Aparna Dev
    Gudrun Ihrke
    Woohyun Park
    Alyson Petruncio
    Aubrey Weigel
    Wyatt Korff
    Florian Engert
    Jeff W. Lichtman
    Misha Ahrens
    International Conference on Learning Representations (ICLR) (2025)
    Preview abstract Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we present the Zebrafish Activity Prediction Benchmark (ZAPBench), which quantitatively measures progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of more than 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into ZAP forecasting methods. View details
    A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution
    Alex Shapson-Coe
    Daniel R. Berger
    Yuelong Wu
    Richard L. Schalek
    Shuohong Wang
    Neha Karlupia
    Sven Dorkenwald
    Evelina Sjostedt
    Dongil Lee
    Luke Bailey
    Angerica Fitzmaurice
    Rohin Kar
    Benjamin Field
    Hank Wu
    Julian Wagner-Carena
    David Aley
    Joanna Lau
    Zudi Lin
    Donglai Wei
    Hanspeter Pfister
    Adi Peleg
    Jeff W. Lichtman
    Science (2024)
    Preview abstract To fully understand how the human brain works, knowledge of its structure at high resolution is needed. Presented here is a computationally intensive reconstruction of the ultrastructure of a cubic millimeter of human temporal cortex that was surgically removed to gain access to an underlying epileptic focus. It contains about 57,000 cells, about 230 millimeters of blood vessels, and about 150 million synapses and comprises 1.4 petabytes. Our analysis showed that glia outnumber neurons 2:1, oligodendrocytes were the most common cell, deep layer excitatory neurons could be classified on the basis of dendritic orientation, and among thousands of weak connections to each neuron, there exist rare powerful axonal inputs of up to 50 synapses. Further studies using this resource may bring valuable insights into the mysteries of the human brain. View details
    Preview abstract Early machine-learning systems were inspired by neural networks — now AI might allow neuroscientists to get to grips with the brain’s unique complexities. View details
    Multi-Layered Maps of Neuropil with Segmentation Guided Contrastive Learning
    Sven Dorkenwald
    Daniel R. Berger
    Agnes L. Bodor
    Forrest Collman
    Casey M. Schneider-Mizell
    Nuno Maçarico da Costa
    Jeff W. Lichtman
    Nature Methods (2023)
    Preview abstract Maps of the nervous system that identify individual cells along with their type, subcellular components and connectivity have the potential to elucidate fundamental organizational principles of neural circuits. Nanometer-resolution imaging of brain tissue provides the necessary raw data, but inferring cellular and subcellular annotation layers is challenging. We present segmentation-guided contrastive learning of representations (SegCLR), a self-supervised machine learning technique that produces representations of cells directly from 3D imagery and segmentations. When applied to volumes of human and mouse cortex, SegCLR enables accurate classification of cellular subcompartments and achieves performance equivalent to a supervised approach while requiring 400-fold fewer labeled examples. SegCLR also enables inference of cell types from fragments as small as 10 μm, which enhances the utility of volumes in which many neurites are truncated at boundaries. Finally, SegCLR enables exploration of layer 5 pyramidal cell subtypes and automated large-scale analysis of synaptic partners in mouse visual cortex. View details
    Structured sampling of olfactory input by the fly mushroom body
    Zhihao Zheng
    Feng Li
    Corey Fisher
    Iqbal J. Ali
    Nadiya Sharifi
    Steven Calle-Schuler
    Joseph Hsu
    Najla Masoodpanah
    Lucia Kmecova
    Tom Kazimiers
    Eric Perlman
    Matthew Nichols
    Davi Bock
    Current Biology, 32 (2022), pp. 3334-3349
    Preview abstract Associative memory formation and recall in the fruit fly Drosophila melanogaster is subserved by the mushroom body (MB). Upon arrival in the MB, sensory information undergoes a profound transformation from broadly tuned and stereotyped odorant responses in the olfactory projection neuron (PN) layer to narrowly tuned and nonstereotyped responses in the Kenyon cells (KCs). Theory and experiment suggest that this transformation is implemented by random connectivity between KCs and PNs. However, this hypothesis has been challenging to test, given the difficulty of mapping synaptic connections between large numbers of brain-spanning neurons. Here, we used a recent whole-brain electron microscopy volume of the adult fruit fly to map PN-to-KC connectivity at synaptic resolution. The PN-KC connectome revealed unexpected structure, with preponderantly food-responsive PN types converging at above-chance levels on downstream KCs. Axons of the overconvergent PN types tended to arborize near one another in the MB main calyx, making local KC dendrites more likely to receive input from those types. Overconvergent PN types preferentially co-arborize and connect with dendrites of αβ and α′β′ KC subtypes. Computational simulation of the observed network showed degraded discrimination performance compared with a random network, except when all signal flowed through the overconvergent, primarily food-responsive PN types. Additional theory and experiment will be needed to fully characterize the impact of the observed non-random network structure on associative memory formation and recall. View details
    SyConn2: dense synaptic connectivity inference for volume electron microscopy
    Philipp J. Schubert
    Sven Dorkenwald
    Jonathan Klimesch
    Fabian Svara
    Andrei Mancu
    Hashir Ahmad
    Michale S. Fee
    Joergen Kornfeld
    Nature Methods, 19 (2022), 1367–1370
    Preview abstract The ability to acquire ever larger datasets of brain tissue using volume electron microscopy leads to an increasing demand for the automated extraction of connectomic information. We introduce SyConn2, an open-source connectome analysis toolkit, which works with both on-site high-performance compute environments and rentable cloud computing clusters. SyConn2 was tested on connectomic datasets with more than 10 million synapses, provides a web-based visualization interface and makes these data amenable to complex anatomical and neuronal connectivity queries. View details
    Denoising-based Image Compression for Connectomics
    Alex Shapson-Coe
    Richard L. Schalek
    Johannes Ballé
    Jeff W. Lichtman
    bioRxiv (2021)
    Preview abstract Connectomic reconstruction of neural circuits relies on nanometer resolution microscopy which produces on the order of a petabyte of imagery for each cubic millimeter of brain tissue. The cost of storing such data is a significant barrier to broadening the use of connectomic approaches and scaling to even larger volumes. We present an image compression approach that uses machine learning-based denoising and standard image codecs to compress raw electron microscopy imagery of neuropil up to 17-fold with negligible loss of reconstruction accuracy. View details
    ×