Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10128 publications
Understanding metric-related pitfalls in image analysis validation
Annika Reinke
Lena Maier-Hein
Paul Jager
Shravya Shetty
Understanding Metrics Workgroup
Nature Methods (2024)
Preview abstract
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
View details
Multimodal Modeling for Spoken Language Identification
Shikhar Bharadwaj
Sriram (Sri) Ganapathy
Sid Dalmia
Wei Han
Yu Zhang
Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)
Preview abstract
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification. Our study reveals that metadata such as video title, description and geographic location provide substantial information to identify the spoken language of the multimedia recording. We conduct experiments using two diverse public datasets of YouTube videos, and obtain state-of-the-art results on the language identification task. We additionally conduct an ablation study that describes the distinct contribution of each modality for language recognition.
View details
Preview abstract
Background. Wildfire research uses ensemble methods to analyze fire behaviors and assess
uncertainties. Nonetheless, current research methods are either confined to simple models
or complex simulations with limits. Modern computing tools could allow for efficient, high-
fidelity ensemble simulations. Aims. This study proposes a high-fidelity ensemble wildfire
simulation framework for studying wildfire behavior, ML tasks, fire-risk assessment, and
uncertainty analysis. Methods. In this research, we present a simulation framework that
integrates the Swirl-Fire large-eddy simulation tool for wildfire predictions with the Vizier
optimization platform for automated run-time management of ensemble simulations and
large-scale batch processing. All simulations are executed on tensor-processing units to
enhance computational efficiency. Key results. A dataset of 117 simulations is created,
each with 1.35 billion mesh points. The simulations are compared to existing experimental
data and show good agreement in terms of fire rate of spread. Computations are done for
fire acceleration, mean rate of spread, and fireline intensity. Conclusions. Strong coupling
between these 2 parameters are observed for the fire spread and intermittency. A critical
Froude number that delineates fires from plume-driven to convection-driven is identified and
confirmed with literature observations. Implications. The ensemble simulation framework
is efficient in facilitating parametric wildfire studies.
View details
HyperAttention: Large-scale Attention in Linear Time
Amin Karbasi
Amir Zandieh
Insu Han
David Woodruff
HyperAttention: Long-context Attention in Near-Linear Time (2024) (to appear)
Preview abstract
In this paper, we introduce a novel approximate attention mechanism dubbed ``HyperAttention``. Despite the rapidly increasing size and complexity of contexts used with Large Language Models (LLM), there is still a dire lack of computationally efficient attention mechanisms scaling better than the naive quadratic time. HyperAttention addresses this gap: it delivers provably linear time complexity with respect to the size of the context, while only incurring a negligible loss in downstream quality. Distinctively, it integrates the principles of Locality Sensitive Hashing (LSH), for efficient detection of heavy elements, along with uniform column sampling, allowing for a good approximation both of the heavy and light components of the attention matrix. HyperAttention provably approximates the attention layer in \textit{linear time}, making it the first practical linear time approximate attention mechanism. Crucially, HyperAttention has a highly-modular design, allowing seamless integration of other rapid low-level implementations, most notably FlashAttention. Empirical evaluations indicate that HyperAttention surpasses the existing methods, achieving orders of magnitude speed-up when compared to prevalent state-of-the-art solutions such as Flash Attention. This breakthrough presents significant implications for enabling the scalability of LLMs to significantly larger contexts.
View details
Predicting Cardiovascular Disease Risk using Photoplethysmography and Deep Learning
Sebastien Baur
Christina Chen
Mariam Jabara
Babak Behsaz
Shravya Shetty
Goodarz Danaei
Diego Ardila
PLOS Glob Public Health, 4(6) (2024), e0003204
Preview abstract
Cardiovascular diseases (CVDs) are responsible for a large proportion of premature deaths in low- and middle-income countries. Early CVD detection and intervention is critical in these populations, yet many existing CVD risk scores require a physical examination or lab measurements, which can be challenging in such health systems due to limited accessibility. We investigated the potential to use photoplethysmography (PPG), a sensing technology available on most smartphones that can potentially enable large-scale screening at low cost, for CVD risk prediction. We developed a deep learning PPG-based CVD risk score (DLS) to predict the probability of having major adverse cardiovascular events (MACE: non-fatal myocardial infarction, stroke, and cardiovascular death) within ten years, given only age, sex, smoking status and PPG as predictors. We compare the DLS with the office-based refit-WHO score, which adopts the shared predictors from WHO and Globorisk scores (age, sex, smoking status, height, weight and systolic blood pressure) but refitted on the UK Biobank (UKB) cohort. All models were trained on a development dataset (141,509 participants) and evaluated on a geographically separate test (54,856 participants) dataset, both from UKB. DLS’s C-statistic (71.1%, 95% CI 69.9–72.4) is non-inferior to office-based refit-WHO score (70.9%, 95% CI 69.7–72.2; non-inferiority margin of 2.5%, p<0.01) in the test dataset. The calibration of the DLS is satisfactory, with a 1.8% mean absolute calibration error. Adding DLS features to the office-based score increases the C-statistic by 1.0% (95% CI 0.6–1.4). DLS predicts ten-year MACE risk comparable with the office-based refit-WHO score. Interpretability analyses suggest that the DLS-extracted features are related to PPG waveform morphology and are independent of heart rate. Our study provides a proof-of-concept and suggests the potential of a PPG-based approach strategies for community-based primary prevention in resource-limited regions.
View details
DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems
Yair Schiff
Jeff Parker
Volodymyr Kuleshov
International Conference on Machine Learning (ICML) (2024)
Preview abstract
Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories’ length increases. We use our framework to propose a tractable and sample efficient objective that can be used with any existing learning objectives. Our Dynamics Stable Learning by Invariant Measure (DySLIM) objective enables model training that achieves better point-wise tracking and long-term statistical accuracy relative to other learning objectives. By targeting the distribution with a scalable regularization term, we hope that this approach can be extended to more complex systems exhibiting slowly-variant distributions, such as weather and climate models. Code to reproduce our experiments is available here: https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/ergodic.
View details
Preview abstract
This is the seventh installment of the Developer Productivity for Humans column. This installment focuses on software quality: what it means, how developers see it, how we break it down into 4 types of quality, and the impact these have on each other.
View details
Preview abstract
Advances in deep learning systems have allowed large models to match or surpass human accuracy on a number of skills such as image classification, basic programming, and standardized test taking. As the performance of the most capable models begin to saturate on tasks where humans already achieve high accuracy, it becomes necessary to benchmark models on increasingly complex abilities. One such task is forecasting the future outcome of events. In this work we describe experiments using a novel dataset of real world events and associated human predictions, an evaluation metric to measure forecasting ability, and the accuracy of a number of different LLM based forecasting designs on the provided dataset. Additionally, we analyze the performance of the LLM forecasters against human predictions and find that models still struggle to make accurate predictions about the future. Our follow-up experiments indicate this is likely due to models' tendency to guess that most events are unlikely to occur (which tends to be true for many prediction datasets, but does not reflect actual forecasting abilities). We reflect on next steps for developing a systematic and reliable approach to studying LLM forecasting.
View details
Geographical accessibility to emergency obstetric care in urban Nigeria using closer-to-reality travel time estimates
Aduragbemi Banke-Thomas
Kerry L. M. Wong
Tope Olubodun
Peter M. Macharia
Narayanan Sundararajan
Yash Shah
Mansi Kansal
Swapnil Vispute
Olakunmi Ogunyemi
Uchenna Gwacham-Anisiobi
Jia Wang
Ibukun-Oluwa Omolade Abejirinde
Prestige Tatenda Makanga
Ngozi Azodoh
Charles Nzelu, PhD
Charlotte Stanton
Bosede B. Afolabi
Lenka Beňová
Lancet Global Health (2024)
Preview abstract
Background
Better accessibility of emergency obstetric care (CEmOC) facilities can significantly reduce maternal and perinatal deaths. However, pregnant women living in urban settings face additional complex challenges travelling to facilities. We estimated geographical accessibility and coverage to the nearest, second nearest, and third nearest public and private CEmOC facilities in the 15 largest Nigerian cities.
Methods
We mapped city boundaries, verified and geocoded functional CEmOC facilities, and assembled population distribution for women of childbearing age (WoCBA). We used Google Maps Platform’s internal Directions Application Programming Interface (API) to derive driving times to public, private, or either facility-type. Median travel time (MTT) and percentage of WoCBA able to reach care were summarised for eight traffic scenarios (peak and non-peak hours on weekdays and weekends) by city and within-city (wards) under different travel time thresholds (<15, <30, <60 min).
Findings
City-level MTT to the nearest CEmOC facility ranged from 18min (Maiduguri) to 46min (Kaduna). Within cities, MTT varied by location, with informal settlements and peripheral areas being the worst off. The percentages of WoCBA within 60min to their nearest public CEmOC were nearly universal; whilst the percentages of WoCBA within 30min reach to their nearest public CEmOC were between 33% in Aba to over 95% in Ilorin and Maiduguri. During peak traffic times, the median number of public CEmOC facilities reachable by WoCBA under 30min was zero in eight of 15 cities.
Interpretation
This approach provides more context-specific, finer, and policy-relevant evidence to support improving CEmOC service accessibility in urban Africa.
View details
SAC125 - SSAC Report on Registrar Nameserver Management
Gautam Akiwate
Tim April
kc claffy
Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2024), pp. 56
Preview abstract
During domain registration, a minimum of two nameservers are typically required, and this
remains a requirement for any future updates to the domain. Often, domains are delegated to
nameservers that are subordinate to some other domains, creating inter-domain dependencies.
This network of dependencies creates a scenario where the functionality of a domain depends
on the operational status of another domain. This setup lacks contractual or procedural
safeguards against disruption or misuse, especially when the nameserver parent domain expires.
Most registries forbid deleting an expired domain if other domains depend on it for name
resolution. These constraints aim to prevent disruptions in DNS resolution for the dependent
domains. However, this also means that the expired domain remains in a liminal state, neither
fully operational nor completely removed. When registrars cannot delete expired domains with
dependents, they are forced to bear the burden of sponsoring the domain without remuneration
from the registrant. A peer-reviewed study, "Risky BIZness: Risks derived from Registrar Name
Management," observed that some registrars have found and utilized a loophole to these
constraints by renaming the host objects that are subordinate to the expiring domain.1 Once
renamed, the host objects are what Akiwate et al.—and subsequently the SSAC—refers to as
sacrificial nameservers.
This report focuses on a specific type of sacrificial nameserver where the parent domains of the renamed host objects are considered to be unsafe because they are registrable. Registrable
parent domains of sacrificial nameservers introduce a new attack surface for domain resolution
hijacking, as malicious actors can exploit unsafe sacrificial nameservers to gain unauthorized
control over the dependent domains, leading to manipulation or disruption. Unlike traditional
domain hijacking techniques that exploit compromised account credentials or manipulate the
resolution protocol, this report focuses on this unforeseen risk arising from a longstanding
practice employed by some registrars.
In this report, the SSAC explores potential solutions to remediate exposed domains and prevent
the creation of new unsafe sacrificial nameservers. The SSAC examines each proposed solution for its feasibility, effectiveness, and potential to reduce the attack surface without introducing undue complexity or new vulnerabilities into the DNS ecosystem.
View details
PROMPT: A Fast and Extensible Memory Profiling Framework
Ziyang Xu
Yebin Chon
Yian Su
Zujun Tan
Simone Campanoni
David I. August
Proceedings of the ACM on Programming Languages, 8, Issue OOPSLA (2024)
Preview abstract
Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT's impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.
View details
CodecLM: Aligning Language Models with Tailored Synthetic Data
Chun-Liang Li
Jin Miao
NAACL 2024
Preview abstract
Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works focus on generating diverse instructions and applying LLM to increase instruction complexity, often neglecting downstream use cases. It remains unclear how to tailor high-quality data to elicit better instruction-following abilities in different target instruction distributions and LLMs. To this end, we introduce CodecLM, a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution, and then decode metadata to create tailored instructions. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples. Extensive experiments on four open-domain instruction following benchmarks validate the effectiveness of CodecLM over the current state-of-the-arts.
View details
SkipWriter: LLM-Powered Abbreviated Writing on Tablets
Zheer Xu
Mukund Varma T
Proceedings of UIST 2024 (2024)
Preview abstract
Large Language Models (LLMs) may offer transformative opportunities for text input, especially for physically demanding modalities like handwriting. We studied a form of abbreviated handwriting by designing, developing and evaluating a prototype, named SkipWriter, that convert handwritten strokes of a variable-length, prefix- based abbreviation (e.g., “ho a y” as handwritten strokes) into the intended full phrase (e.g., “how are you” in the digital format) based
on preceding context. SkipWriter consists of an in-production hand-writing recognizer and a LLM fine-tuned on this skip-writing task. With flexible pen input, SkipWriter allows the user to add and revise prefix strokes when predictions don’t match the user’s intent. An user evaluation demonstrated a 60% reduction in motor movements with an average speed of 25.78 WPM. We also showed that this reduction is close to the ceiling of our model in an offline simulation.
View details
(In)Security of File Uploads in Node.js
Harun Oz
Abbas Acar
Ahmet Aris
Amin Kharraz
Selcuk Uluagac
The Web conference (WWW) (2024)
Preview abstract
File upload is a critical feature incorporated by a myriad of web
applications to enable users to share and manage their files conveniently. It has been used in many useful services such as file-sharing
and social media. While file upload is an essential component of
web applications, the lack of rigorous checks on the file name, type,
and content of the uploaded files can result in security issues, often
referred to as Unrestricted File Upload (UFU). In this study, we analyze the (in)security of popular file upload libraries and real-world
applications in the Node.js ecosystem. To automate our analysis, we
propose NodeSec– a tool designed to analyze file upload insecurities in Node.js applications and libraries. NodeSec generates unique
payloads and thoroughly evaluates the application’s file upload security against 13 distinct UFU-type attacks. Utilizing NodeSec, we
analyze the most popular file upload libraries and real-world ap-
plications in the Node.js ecosystem. Our results reveal that some
real-world web applications are vulnerable to UFU attacks and dis-
close serious security bugs in file upload libraries. As of this writing,
we received 19 CVEs and two US-CERT cases for the security issues that we reported. Our findings provide strong evidence that
the dynamic features of Node.js applications introduce security
shortcomings and that web developers should be cautious when
implementing file upload features in their applications.
View details
Understanding and Designing for Trust in AI Powered Developer Tooling
Ugam Kumar
Quinn Madison
IEEE Software (2024)
Preview abstract
Trust is central to how developers engage with AI. In this article, we discuss what we learned from developers about their level of trust in AI enhanced developer tooling, and how we translated those findings into product design recommendations to support customization, and the challenges we encountered along the way.
View details