Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10467 publications
A Recipe for Improving Remote Sensing Zero Shot Generalization
Aviad Barzilai
Yotam Gigi
Vered Silverman
Yehonathan Refael
Bolous Jaber
Amr Helmy
3rd ML4RS Workshop at ICLR 2025
Preview abstract
Foundation models have had a significant impact across various AI applications, enabling applications for use cases that were previously impossible. Visual language models (VLMs), in particular, have outperformed other techniques in many tasks. In remote sensing (RS), foundation models have shown improvements across various applications. However, unlike other fields, the use of VLMs with large-scale remote sensing image-text datasets remains limited.
In this work, we first introduce two novel image-caption datasets for training of remote sensing foundation models. The first dataset pairs aerial and satellite imagery, aligned with Google-Maps data, with high-quality captions generated using Gemini. The second utilizes public web images and their corresponding alt-text, filtered for only remote sensing domain, resulting in a highly diverse dataset.
We show that using these datasets to pre-train the Mammut [], a VLM architecture, results in state-of-the-art generalization performance in a zero-shot classification and cross-modal retrieval on well-known public benchmarks. Secondly, we leverage this newly pre-trained VLM to generate inference attention maps for a novel class query (i.e., a class unseen during training). We subsequently propose an iterative self-supervised fine-tuning approach where samples aligned with these attention maps are iteratively pseudo-labeled and utilized for model training.
View details
Participatory AI Considerations for Advancing Racial Health Equity
Andrea G. Parker
Jatin Alla
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025) (to appear)
Preview abstract
Julia's strength in mathematical computation and high performance makes it a popular choice across scientific fields, mostly due to its focus on mathematics in a broad sense and execution performance. It is a language of choice to implement new numerical algorithms, but it really shines in modelling for optimisation thanks to JuMP.jl and MathOptInterface.jl.
These libraries are, first and foremost, made for mathematical optimisation (linear, mixed-integer, conic, etc.), yet they are now generic enough to support more paradigms, such as constraint programming. This talk will introduce the basic principles behind the current implementation of JuMP.jl and explain why and how they are very good matches for modelling using constraint programming… and solving using any kind of mixed-integer-programming solver.
Constraint-programming solvers can also be implemented using linear programming, in a great collaboration between discrete and continuous optimisation. This talk will briefly explain the connection and its implementation in Google’s CP-SAT, a leading, award-winning constraint solver that uses linear programs in its solving process — a solver that will soon be available in Julia too.
View details
Preview abstract
Mainstream artificial neural network models, such as Deep Neural Networks (DNNs) are computation-heavy and energy-hungry. Weightless Neural Networks (WNNs) are natively built with RAM-based neurons and represent an entirely distinct type of neural network computing compared to DNNs. WNNs are extremely low-latency, low-energy, and suitable for efficient, accurate, edge inference. The WNN approach derives an implicit inspiration from the decoding process observed in the dendritic trees of biological neurons, making neurons based on Random Access Memories (RAMs) and/or Lookup Tables (LUTs) ready-to-deploy neuromorphic digital circuits. Since FPGAs are abundant in LUTs, LUT based WNNs are a natural fit for implementing edge inference in FPGAs.
WNNs has been demonstrated to be an energetically efficient AI model, both in software, as well as in hardware. For instance, the most recent DWN – Differential Weightless Neural Network – model demonstrates up to 135× reduction in energy costs in FPGA implementations compared to other multiplication-free approaches, such as binary neural networks (BNNs) and DiffLogicNet, up to 9% higher accuracy in deployments on constrained devices, and culminate in up to 42.8× reduction in circuit area for ultra-low-cost chip implementations. This tutorial will help participants understand how WNNs work, why WNNs were underdogs for such a long time, and be introduced to the most recent members of the WNN family, such as BTHOWeN , LogicWiSARD, COIN, ULEEN and DWN, and contrast to BNNs and LogicNets.
View details
Oculomics: Current Concepts and Evidence
Zhuoting Zhu
Yueye Wang
Ziyi Qi
Wenyi Hu
Xiayin Zhang
Siegfried Wagner
Yujie Wang
An Ran Ran
Joshua Ong
Ethan Waisberg
Mouayad Masalkhi
Alex Suh
Yih Chung Tham
Carol Y. Cheung
Xiaohong Yang
Honghua Yu
Zongyuan Ge
Wei Wang
Bin Sheng
Andrew G. Lee
Alastair Denniston
Peter van Wijngaarden
Pearse Keane
Ching-Yu Cheng
Mingguang He
Tien Yin Wong
Progress in Retinal and Eye Research (2025)
Preview abstract
The eye provides novel insights into general health, as well as pathogenesis and development of systemic diseases. In the past decade, growing evidence has demonstrated that the eye's structure and function mirror multiple systemic health conditions, especially in cardiovascular diseases, neurodegenerative disorders, and kidney impairments. This has given rise to the field of oculomics- the application of ophthalmic biomarkers to understand mechanisms, detect and predict disease. The development of this field has been accelerated by three major advances: 1) the availability and widespread clinical adoption of high-resolution and non-invasive ophthalmic imaging (“hardware”); 2) the availability of large studies to interrogate associations (“big data”); 3) the development of novel analytical methods, including artificial intelligence (AI) (“software”). Oculomics offers an opportunity to enhance our understanding of the interplay between the eye and the body, while supporting development of innovative diagnostic, prognostic, and therapeutic tools. These advances have been further accelerated by developments in AI, coupled with large-scale linkage datasets linking ocular imaging data with systemic health data. Oculomics also enables the detection, screening, diagnosis, and monitoring of many systemic health conditions. Furthermore, oculomics with AI allows prediction of the risk of systemic diseases, enabling risk stratification, opening up new avenues for prevention or individualized risk prediction and prevention, facilitating personalized medicine. In this review, we summarise current concepts and evidence in the field of oculomics, highlighting the progress that has been made, remaining challenges, and the opportunities for future research.
View details
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
Alexander Immer
Alex Bo-Yuan Chen
Mariela D. Petkova
Nirmala A. Iyer
Luuk Willem Hesselink
Aparna Dev
Gudrun Ihrke
Woohyun Park
Alyson Petruncio
Aubrey Weigel
Wyatt Korff
Florian Engert
Jeff W. Lichtman
Misha B. Ahrens
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we present the Zebrafish Activity Prediction Benchmark (ZAPBench), which quantitatively measures progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of more than 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into ZAP forecasting methods.
View details
Heterogeneous graph neural networks for species distribution modeling
Christine Kaeser-Chen
Keith Anderson
Michelangelo Conserva
Elise Kleeman
Maxim Neumann
Matt Overlan
Millie Chapman
Drew Purves
arxiv (2025)
Preview abstract
Species distribution models (SDMs) are necessary for measuring and predicting occurrences and habitat suitability of species and their relationship with environmental factors. We introduce a novel presence-only SDM with graph neural networks (GNN). In our model, species and locations are treated as two distinct node sets, and the learning task is predicting detection records as the edges that connect locations to species. Using GNN for SDM allows us to model fine-grained interactions between species and the environment. We evaluate the potential of this methodology on the six-region dataset compiled by National Center for Ecological Analysis and Synthesis (NCEAS) for benchmarking SDMs. For each of the regions, the heterogeneous GNN model is comparable to or outperforms previously-benchmarked single-species SDMs as well as a feed-forward neural network baseline model.
View details
StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI
Alex Fiannaca
Nimer Jaber
Victor Tsaran
Proceedings of the 2025 ACM Symposium on User Interface Software and Technology (UIST'25) (to appear)
Preview abstract
Interactive streetscape mapping tools such as Google Street View (GSV) and Meta Mapillary enable users to virtually navigate and experience real-world environments via immersive 360° imagery but remain fundamentally inaccessible to blind users. We introduce StreetViewAI, the first-ever accessible street view tool, which combines context-aware, multimodal AI, accessible navigation controls, and conversational speech. With StreetViewAI, blind users can virtually examine destinations, engage in open-world exploration, or virtually tour any of the over 220 billion images and 100+ countries where GSV is deployed. We iteratively designed StreetViewAI with a mixed-visual ability team and performed an evaluation with eleven blind users. Our findings demonstrate the value of an accessible street view in supporting POI investigations and remote route planning. We close by enumerating key guidelines for future work.
View details
Preview abstract
Recent work suggested utilizing inference compute, showing that scaling of number of samples consistently improves the fractions of problems solved by any attempt, namely the coverage. In this work, we suggest that inference scaling gains should be compared with proper baselines, as some datasets become degenerate when allowing a large number of attempts. We focus on two domains - mathematical reasoning and factual knowledge, showing that for the MATH and Entity Questions datasets, informed answer enumeration obtains similar or even better results than repeated model sampling, with a much lower sample budget. While we believe that inference scaling is a promising approach for unlocking the potential of language models, we recommend carefully selecting models and datasets when applying this method. Otherwise, the results of inference scaling should be interpreted with caution.
View details
Global earthquake detection and warning using Android phones
Marc Stogaitis
Youngmin Cho
Richard Allen
Boone Spooner
Patrick Robertson
Micah Berman
Greg Wimpey
Robert Bosch
Nivetha Thiruverahan
Steve Malkos
Alexei Barski
Science, 389 (2025), pp. 254-259
Preview abstract
Earthquake early-warning systems are increasingly being deployed as a strategy to reduce losses in earthquakes, but the regional seismic networks they require do not exist in many earthquake-prone countries. We use the global Android smartphone network to develop an earthquake detection capability, an alert delivery system, and a user feedback framework. Over 3 years of operation, the system detected an average of 312 earthquakes per month with magnitudes from M 1.9 to M 7.8 in Türkiye. Alerts were delivered in 98 countries for earthquakes with M ≥4.5, corresponding to ~60 events and 18 million alerts per month. User feedback shows that 85% of people receiving an alert felt shaking, and 36, 28, and 23% received the alert before, during, and after shaking, respectively. We show how smartphone-based earthquake detection algorithms can be implemented at scale and improved through postevent analysis.
View details
H2E: Hand, Head, Eye: A Multimodal Cascade of Natural Inputs
Khushman Patel
Hans Gellersen
Ken Pfeuffer
IEEE VR (2025)
Preview abstract
Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H2E, a fine and coarse-grained pointing technique that cascades Hand, Head, and Eye inputs. As users initiate a pinch gesture, a cursor appears at the gaze point that can be dragged by head pointing before pinch confirmation. This has the potential advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design and implementation of the technique. Furthermore, we present an evaluation of our method in a Fitts-based user study, exploring the speed-accuracy trade-offs against a gaze and pinch interaction baseline.
View details
Passive Heart Rate Monitoring During Smartphone Use in Everyday Life
Shun Liao
Paolo Di Achille
Jiang Wu
Silviu Borac
Jonathan Wang
Eric Teasley
Lawrence Cai
Daniel McDuff
Hao-Wei Su
Brent Winslow
Anupam Pathak
Shwetak Patel
Jim Taylor
Jamie Rogers
(2025)
Preview abstract
Resting heart rate (RHR) is an important biomarker of cardiovascular health and mortality, but tracking it longitudinally generally requires a wearable device, limiting its availability. We present PHRM, a deep learning system for passive heart rate (HR) and RHR measurements during ordinary smartphone use, using facial video-based photoplethysmography. Our system was developed using 225,773 videos from 495 participants and validated on 185,970 videos from 205 participants in laboratory and free-living conditions – the largest validation study of its kind. Compared to reference electrocardiogram, PHRM achieved a mean absolute percentage error (MAPE) <10% for HR measurements across three skin tone groups of light, medium and dark pigmentation; MAPE for each skin tone group was non-inferior versus the others. Daily RHR measured by PHRM had a mean absolute error <5 bpm compared to a wearable HR tracker, and was associated with known risk factors. These results highlight the potential of smartphones to enable passive and equitable heart health monitoring.
View details
Preview abstract
Measuring productivity is equivalent to building a model. All models are wrong, but some are useful. Productivity models are often “worryingly selective” (wrong because of omissions). Worrying selectivity can be combated by taking a holistic approach that includes multiple measurements of multiple outcomes. Productivity models should include multiple outcomes, metrics, and methods.
View details
Origin-destination travel demand estimation: an approach that scales worldwide, and its application to five metropolitan highway networks
Christopher Bian
Yechen Li
Willa Ng
Bin Yan
Janny Zhang
Transportation Research Part B: Methodological (2025) (to appear)
Preview abstract
Estimating Origin-Destination (OD) travel demand is vital for effective urban planning
and traffic management. Developing universally applicable OD estimation
methodologies is significantly challenged by the pervasive scarcity of high-fidelity traffic
data and the difficulty in obtaining city-specific prior OD estimates (or seed ODs), which
are often prerequisite for traditional approaches. Our proposed method directly
estimates OD travel demand by systematically leveraging aggregated, anonymized
statistics from Google Maps Traffic Trends, obviating the need for conventional census
or city-provided OD data. The OD demand is estimated by formulating a single-level,
one-dimensional, continuous nonlinear optimization problem with nonlinear equality
and bound constraints to replicate highway path travel times. The method achieves
efficiency and scalability by employing a differentiable analytical macroscopic network
model. This model by design is computationally lightweight, distinguished by its
parsimonious parameterization that requires minimal calibration effort and its capacity
for instantaneous evaluation. These attributes ensure the method's broad applicability
and practical utility across diverse cities globally. Using segment sensor counts from
Los Angeles and San Diego highway networks, we validate our proposed approach,
demonstrating a two-thirds to three-quarters improvement in the fit to segment count
data over a baseline. Beyond validation, we establish the method's scalability and
robust performance in replicating path travel times across diverse highway networks,
including Seattle, Orlando, Denver, Philadelphia, and Boston. In these expanded
evaluations, our method not only aligns with simulation-based benchmarks but also
achieves an average 13% improvement in it's ability to fit travel time data compared to
the baseline during afternoon peak hours.
View details
Scalable Private Partition Selection via Adaptive Weighting
Justin Y. Chen
Forty-second International Conference on Machine Learning (2025)
Preview abstract
In the differentially private partition selection problem (a.k.a. private set union, private key discovery), users hold subsets of items from an unbounded universe. The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy. Solutions to this problem are a core building block for many privacy-preserving ML applications including vocabulary extraction in a private corpus, computing statistics over categorical data and learning embeddings over user-provided items.
We propose an algorithm for this problem, MaxAdaptiveDegree(MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight, thereby increasing the probability that less frequent items are output. Our algorithm can be efficiently implemented in massively parallel computation systems allowing scalability to very large datasets. We prove that our algorithm stochastically dominates the standard parallel algorithm for this problem. We also develop a two-round version of our algorithm, MAD2R, where results of the computation in the first round are used to bias the weighting in the second round to maximize the number of items output. In experiments, our algorithms provide the best results across the board among parallel algorithms and scale to datasets with hundreds of billions of items, up to three orders of magnitude larger than those analyzed by prior sequential algorithms.
View details