Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11329 publications
Preview abstract The management of a hybrid workforce comprising human and autonomous computational agents may be challenged by the use of separate systems for human capital and software assets, which can create a governance gap. A system can provide a unified framework for managing a hybrid workforce. For example, the system may utilize a labor service mesh to analyze and route tasks to either a human intent tier or an agentic execution tier. A potential principle of the system is structural symmetry, where computational agents can be assigned digital identities and managed through a lifecycle process that may parallel human resource functions, such as onboarding, performance evaluation, and structured offboarding. This integrated approach can facilitate a unified system of record and governance model for an organization's intelligence capacity. View details
Preview abstract Generative AI is reshaping software development, yet its psychological impact remains under-researched. During May and August 2025 we conducted reflexive thematic analysis of interviews with 12 senior engineers (≥5 years experience) recruited from Western technology hubs to explore shifts in professional identity. We identify a central transition from "coder to conductor," where AI acts as a cognitive partner. Key findings include: (1) a re-architecting of focus from implementation to strategy; (2) a shift in productivity metrics from output to impact; and (3) a dual-impact on agency, where AI empowers autonomy but threatens competence through de-skilling anxieties. These findings suggest that as implementation becomes commoditised, organisational training and career progression must prioritise architectural mastery and metacognitive oversight to ensure sustained developer motivation and system integrity. View details
Preview abstract Responsive user interfaces enable dynamically adjusting user interfaces based on device-specific aspects such as screen size, aspect ratio, display resolution, etc. However, traditional responsive design fails to account for different types of constraints of a user and task criticality of the task being performed via the UI. Misalignment between the UI design, user context and task criticality can lead to user error. This disclosure describes techniques, implemented with user permission, for dynamically modifying the layout, information density, and/or interactive physics of a user interface based on a dual-factor analysis of user cognitive state and task criticality. The user's cognitive state can be inferred from behavioral telematics. Task criticality can be inferred from semantic analysis. The information density and other parameters of a user interface are automatically adjusted based on such analyses. Such adjustments include applying or relaxing restrictions on interactivity and adjusting visual prominence of various UI elements to adjust the information density of the user interface. The adjustments can also include adjusting friction as appropriate, hiding certain aspects of the user interface, or other types of adjustments. View details
Preview abstract We introduce a new context-enriched time series forecasting benchmark TimesX. TimesX contains a wide selection of high-quality real-world time series and diverse textual contexts from an automated generating pipeline, which helps address three main issues of existing benchmarks: (1) poor generalization due to low data volume and data being synthetic, (2) restricted forms of context, and (3) an inability to mitigate data leakage. We conduct a thorough empirical study of current multimodal solutions on TimesX. Our results suggest that most multimodal solutions that work well on existing benchmarks may fail on TimesX. In contrast, simple ensemble methods that leverage the rich textual context can outperform strong unimodal baselines and other multimodal baselines. ** Below this is what was submitted to ITP. ** We create a real world multimodal time-series forecasting benchmark that encompasses diverse domains and regions. Each time-series is annotated by various kinds of contexts like metadata, date and holiday information, dynamic events related to the time-series. This is sufficiently more advanced than other available benchmarks which rely wither on static metadata alone or synthetic examples. This forms a test bed for multimodal forecasting. We also present some baseline results showing that ensembles of publicly available LLMs and time-series foundation models can demonstrate non-trivial performance on this bechmark. View details
Preview abstract We prove the following asymptotically tight lower bound for k-color discrepancy: For any k ≥ 2, there exists a hypergraph with n vertices such that its k-color discrepancy is at least Ω(√n). This improves on the previously known lower bound of Ω(√n/ log k) due to Caragiannis et al. [CLS25]. As an application, we show that our result implies improved lower bounds for group fair division. View details
Preview abstract This writeup defines the Hydration Proxy Pattern, a framework for building stateful conversational data systems over stateless LLM APIs. It describes a platform-agnostic approach to decoupling persistence from the AI provider through secure server-side intermediation and hybrid storage tiers. The abstract provides a blueprint for managing the "Persistence Gap" in enterprise AI integrations, detailing high-level strategies for session history management, streaming, and multi-stage semantic grounding without disclosing specific internal implementation details. View details
Pixel Watch: Robust Heart Rate Sensing from Multipath PPG and On-Device Deep Learning Trained on 10,000 hours of Free-Living and Fitness Data
Megan Walker
Yojan Patel
Shyam Tailor
Matt Wimmer
Brennan Garrett
Dan Howe
Hamed Vavadi
Tien Le
Steve Diamond
Oleksiy Vyalov
Vik Sharma
Pete Richards
Tracy Giest
Erika Siegel
Tuan Phan
Sam Mravca
Derrick Vickers
Benjamin Stone
Katarina Vukosavljević
Justin Phillips
YongSuk Cho
Stefanie Hollidge
Antony Siahaan
Soren Brage
Shwetak Patel
Robert Harle
IEEE Sensors Letters (2026)
Preview abstract The Pixel Watch 2 (PW2) is the first Google smartwatch to combine multipath photoplethysmography (PPG) with deep learning-based heart rate inference, designed to significantly improve sensing accuracy during motion-heavy activities. The device processes 10 optical channels using an on-device, 15-layer temporally dilated convolutional neural network (~300K parameters) to yield a 1 Hz heart rate output. Crucial to this model's performance was its training on a massive dataset comprising 10,000 hours of data from 962 participants, curated from a broader corpus of controlled and free-living activities. We evaluated the PW2's sensing performance across two independent validation sets: an in-house fitness dataset (229 participants, 250 hours) and an external free-living dataset (27 participants, 1000+ hours). The system achieved 95% Limits of Agreement of -10.34 to 8.66 BPM during exercise and -6.57 to 7.48 BPM during free-living activities, demonstrating substantially tighter error margins than previous Google devices. Finally, we discuss key design lessons, emphasizing that large-scale deep learning was instrumental in fully leveraging multipath PPG hardware over traditional signal processing approaches. View details
Preview abstract Voice activity detection (VAD) plays a vital role in enabling applications such as speech recognition. We analyze the impact of window size on the accuracy of three VAD algorithms: Silero, WebRTC, and Root Mean Square (RMS) across a set of diverse real-world digital audio streams. We additionally explore the use of hysteresis on top of each VAD output. Our results offer practical references for optimizing VAD systems. Silero significantly outperforms WebRTC and RMS, and hysteresis provides a benefit for WebRTC. View details
Preview abstract In "Elephants, Goldfish and the New Golden Age of Software Engineering," the author discusses how AI is changing knowledge work, especially software development. Written from the perspective of April 2026, the article points out that while AI speeds up coding, it can also quickly generate a lot of mistakes and messy code if it isn't carefully managed by human oversight and clear processes. The paper outlines a practical approach to working with AI, broken down into three main sections: * **Using AI as a Tool, Not a Toy:** The author notes that people often get poor results by asking AI to do everything in a single prompt. Instead, users should have back-and-forth conversations with AI to question assumptions, set clear grading rules, and guide the research. The main point is that humans must still provide the final judgment; AI is simply a way to speed up and record that thinking. * **The Elephant-Goldfish Model:** As AI creates more code than humans can easily read, written design documents become more important than the code itself. To keep AI on track, the author suggests a two-part method: * **The Elephant:** A long chat session where the human and AI discuss ideas and write a detailed design document *before* any code is written. This session holds all of the project's background information and decisions. * **The Goldfish:** A brand-new AI chat session with no memory. The human asks this "goldfish" to read the design document. If the goldfish cannot understand the plan based only on that document, the document needs more details. * Only after the design document is clear enough for the goldfish to understand does the human ask the AI to write the code based on those strict instructions. * **Managing AI and the Future of Work:** The author expects that regular employees will soon act like managers, overseeing multiple AI helpers. Because of this, workers need to learn basic management skills, like how to delegate tasks and set clear boundaries. Also, since AI will handle routine chores, humans will need to practice focusing for longer periods to do deeper, harder thinking. Ultimately, a worker's value will come from their planning and decision-making skills, rather than their ability to type code. View details
Fair Allocation of Indivisible Goods with Variable Groups
Paul Golz
Warut Suksompong
Ayumi Igarashi
AAAI (2026)
Preview abstract We study the fair allocation of indivisible goods with variable groups. In this model, the goal is to partition the agents into groups of given sizes and allocate the goods to the groups in a fair manner. We show that for any number of groups and corresponding sizes, there always exists an envy-free up to one good (EF1) outcome, thereby generalizing an important result from the individual setting. Our result holds for arbitrary monotonic utilities and comes with an efficient algorithm. We also prove that the EF1 existence can be guaranteed even when the goods lie on a path and each group must receive a connected bundle. In addition, we consider a probabilistic model where the utilities are additive and drawn randomly from a distribution. We show that if there are n agents and the number of goods m is divisible by the number of groups k, then an envy-free outcome exists with high probability if m = ω(log n), and this bound is tight. On the other hand, if m is not divisible by k, then an envy-free outcome is unlikely to exist as long as m = o(√n). View details
Preview abstract Artificial intelligence is rapidly evolving, marked by the emergence of Large Language Model (LLM) agents – systems capable of complex reasoning, planning, and interaction with digital and physical environments. These agents, powered by advancements in LLMs, demonstrate remarkable capabilities across diverse domains, including finance, healthcare, web navigation, software development, and daily task assistance. Unlike traditional AI systems, LLM agents can perceive their surroundings, formulate multi-step plans, utilize external tools and APIs, access memory or knowledge bases, and execute actions to achieve specified goals. This ability to act upon the world, however, introduces significant safety and security challenges. The safety paradigms developed for traditional LLMs, primarily focused on mitigating harmful textual outputs (e.g., toxicity, bias), are insufficient for safeguarding LLM agents. Agents interacting with dynamic environments and executing actions present a broader attack surface and new categories of risk. These include performing unsafe operations, violating privacy constraints through improper data handling or access control failures, deviating from user objectives (task misalignment), and susceptibility to novel manipulation techniques like indirect prompt injection and memory poisoning. Ensuring the trustworthy operation of these powerful agents is paramount, especially as they are integrated into high-stakes applications. To address this critical challenge, we introduce VeriGuard, a novel framework designed to enhance the safety and reliability of LLM agents by interactively verifying their policies and the actions. VeriGuard integrates a verification module that intercepts code-based actions proposed by the agent. In the first step, VeriGuard will generates and verifies the policies. The policies are rigorously checked against a set of predefined safety and security specifications Then each action will be verified to make sure it will align with the agent specification. This interactive verification loop ensures that the agent's behavior remains within safe operational bounds, effectively preventing the execution of harmful or unintended operations. By verifying each step, VeriGuard provides a robust safeguard, substantially improving the trustworthiness of LLM agents in complex, real-world environments. View details
Preview abstract The accelerated integration of generative AI technologies and agentic AI tools, particularly those like ChatGPT, into workplace settings has introduced complex challenges concerning data governance, regulatory compliance, and organizational privacy (GDPR 2016; CCPA/CPRA). This study introduces the Digital Shadow AI Risk Theoretical Framework (DART)—a novel theoretical framework designed to systematically identify, classify, and address the latent risks arising from the widespread, and often unregulated, use of AI systems in professional environments (NIST, 2023; OECD AI Policy Observatory, 2023). DART introduces six original, interrelated constructs developed in this study: Unintentional Disclosure Risk, Trust-Dependence Paradox, Data Sovereignty Conflict, Knowledge Dilution Phenomenon, Ethical Black Box Problem, and Organizational Feedback Loops. Each construct reflects a unique dimension of risk that emerges as organizations increasingly rely on AI-driven tools for knowledge work and decision-making. The framework is empirically tested through a mixed-methods research design involving hypothesis testing and statistical analysis of behavioral data gathered from cross-sectional surveys of industry professionals. Two cross-industry surveys (Survey-1: 416 responses, 374 analyzed; Survey-2: 203 responses, 179 analyzed) and CB-SEM tests supported seven of eight hypotheses; H4 (sovereignty) was not significant; H7 (knowledge dilution) was confirmed in replication. The findings highlight critical gaps in employee training, policy awareness, and risk mitigation strategies—underscoring the urgent need for updated governance frameworks, comprehensive AI-use policies, and targeted educational interventions. This paper contributes to emerging scholarship by offering a robust model for understanding and mitigating digital risks in AI-enabled workplaces, providing practical implications for compliance officers, risk managers, and organizational leaders aiming to harness the benefits of generative AI responsibly and securely. The novelty of DART lies in its explicit theorization of workplace-level behavioral risks—especially Shadow AI, which unlike Shadow IT externalizes organizational knowledge into adaptive systems—thereby offering a unified framework that bridges fragmented literatures and grounds them in empirical evidence. View details
Sexual dimorphism in the complete connectome of the Drosophila male central nervous system
Stuart Berg
Isabella R Beckett
Marta Costa
Philipp Schlegel
Elizabeth C Marin
Aljoscha Nern
Stephan Preibisch
Wei Qiu
Shin-ya Takemura
Andrew Champion
Reed A. George
Gary Huang
William Katz
Christopher Ordish
Ken Hayworth
Eric Trautman
Vivek Jayaraman
Wyatt Korff
Geoffrey W Meissner
Sandro Romani
Jan Funke
Christopher Knecht
Stephan Saalfeld
Louis Scheffer
Scott Waddell
Gwyneth Card
Carlos Ribeiro
Michael B. Reiser
Harald Hess
Gerry Rubin
Gregory S.X.E. Jefferis
bioRxiv (2026)
Preview abstract Sex differences in behaviour exist across all animals, typically under strong genetic regulation. In Drosophila, fruitless/doublesex transcription factors can identify dimorphic neurons but their organisation into functional circuits remains unclear. We present the connectome of the entire Drosophila male central nervous system. This contains 166,691 neurons spanning the brain and nerve cord, fully proofread and annotated including fruitless/doublesex expression and 11,691 types. We provide the first comprehensive comparison between male and female brain connectomes to synaptic resolution, finding 7,205 isomorphic, 114 dimorphic, 262 male-specific and 69 female-specific types. This resource enables analysis of full sensory-to-motor circuits underlying complex behaviours and the impact of dimorphic elements. Sex-specific/dimorphic neurons are concentrated in higher brain centres while the sensory and motor periphery are largely isomorphic. Within higher centres, male-specific connections are organised into hotspots defined by male-specific neurons or arbours. Numerous circuit switches reroute sensory information to form antagonistic circuits controlling opposing behaviours. (Full author list included with the paper.) View details
Preview abstract This disclosure describes systems and methods for a multi-agent framework that can automate and scale cognitive work. The framework can, for example, use a cognitive assembly line of specialized computational agents to perform tasks such as research and drafting. A beneficial component could be an adversarial review panel (ARP), which is a multi-agent review system where distinct agent personas critique a generated draft from varied perspectives. The structured feedback from the ARP can be used to automatically iterate on and refine the work product. This approach can improve the intellectual rigor of generated content and reduce the time required for production, which may allow human operators to focus on activities such as strategic oversight and final validation. View details
Preview abstract Optical health sensing algorithms, such as SpO2, sleep monitoring, and metabolic health sensing, critically depend on the accurate measurement of optical emission from Light Emitting Diodes (LEDs) transmitted through user tissue and detected by a photodiode (PD). A significant challenge to the reliability of these measurements is the inherent degradation of LED optical emission intensity over time due to device aging. This degradation can confound the physiological changes being monitored. Our work quantifies the impact of LED aging on sensor signal integrity, specifically examining the Current Transfer Ratio (CTR), which is a key metric defining the ratio of received photocurrent to the LED drive current used for transmission in various health sensing algorithms. We investigate the degradation characteristics across LEDs of different wavelengths. Our findings indicate a relative CTR change due to degradation ranging from 1% to 8% within 100 hours of continuous operation which translates to approximately 3.5 to 7 years of device lifetime. Furthermore, we explore the non-linearity of this degradation and the observed initial ”overshoot” phenomenon in the CTR during aging. We discuss how understanding these dynamics could inform the development of robust specifications for different physiological sensing algorithms. Finally, we present several potential solutions to mitigate the effects of LED aging. During the product design phase, integrating a calibrating photodiode or compensating circuitry around the LED can help preemptively address degradation. In the application space, run-time calibration strategies employing two differently degraded optical paths offer a promising approach to maintain measurement accuracy. View details
×