Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11268 publications
    Fair Allocation of Indivisible Goods with Variable Groups
    Paul Golz
    Warut Suksompong
    Ayumi Igarashi
    AAAI (2026)
    Preview abstract We study the fair allocation of indivisible goods with variable groups. In this model, the goal is to partition the agents into groups of given sizes and allocate the goods to the groups in a fair manner. We show that for any number of groups and corresponding sizes, there always exists an envy-free up to one good (EF1) outcome, thereby generalizing an important result from the individual setting. Our result holds for arbitrary monotonic utilities and comes with an efficient algorithm. We also prove that the EF1 existence can be guaranteed even when the goods lie on a path and each group must receive a connected bundle. In addition, we consider a probabilistic model where the utilities are additive and drawn randomly from a distribution. We show that if there are n agents and the number of goods m is divisible by the number of groups k, then an envy-free outcome exists with high probability if m = ω(log n), and this bound is tight. On the other hand, if m is not divisible by k, then an envy-free outcome is unlikely to exist as long as m = o(√n). View details
    Preview abstract Contrail microphysical simulations and climate simulations have indicated that contrail cirrus cause a substantial fraction of aviation’s climate impact. While the approximations and parameter selections in these simulations have been well-validated over the past two decades, the heat trapping of contrails has not been observed using satellite data beyond a few hours. This is because contrails lose their linear shape after a few hours, making them difficult to distinguish from natural cirrus clouds. Here we provide satellite-driven analysis of long-lived heat trapping by contrails over North and South America. We aggregate a dataset of GOES-16 estimated outgoing longwave radiation and advected trace density of flight paths, and apply causal inference to discern the effect of contrails while controlling for radiative and cloud confounders. As a means of validation, we also generate synthetic datasets with known ground truth, and confirm that applying the causal inference method is able to recover the synthetic ground truth. Since this method yields an estimate which has some differences from both “instantaneous radiative forcing” (iRF) and “effective radiative forcing” (ERF) estimates which have been reported in the literature so far, we introduce the new term “observational radiative forcing, 12 hours” (oRF12). Our analysis estimates the longwave oRF12 from contrails over the Americas averaged 47.9 gigajoules per flight kilometer (95% CI: 31 to 52 GJ/km) during April 2019 to April 2020. View details
    CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors
    Juro Gottweis
    CJ Park
    Salman Rahman
    Ahmed Metwally
    Hong Yu
    Ivor Rendulic
    Yuzhe Yang
    Petar Sirkovic
    Daniel McDuff
    Shwetak Patel
    Nicolas Stroppa
    Yubin Kim
    Mark Malhotra
    Orson Xu
    Sam Schmidgall
    Tim Althoff
    Elahe Vedadi
    Cynthia Breazeal
    Hae Won Park
    (2026)
    Progressive Photorealistic Simplification
    Adi Rosenthal
    Yedid Hoshen
    Arik Shamir
    2026
    Preview abstract Existing image simplification techniques often rely on Non-Photorealistic Rendering (NPR), transforming photographs into stylized sketches, cartoons, or paintings. While effective at reducing visual complexity, such approaches typically sacrifice photographic realism. In this work, we explore a complementary direction: simplifying images while preserving their photorealistic appearance. We introduce progressive semantic image simplification, a framework that iteratively reduces scene complexity by removing and inpainting elements in a controlled manner. At each step, the resulting image remains a plausible natural photograph. Our method combines semantic understanding with generative editing, leveraging Vision-Language Models (VLMs) to identify and prioritize elements for removal, and a learned verifier to ensure photorealism and coherence throughout the process. This is implemented via an iterative \emph{Select–Remove–Verify} pipeline that produces high-quality simplification trajectories. To improve efficiency, we further distill this process into an image-to-video generation model that directly predicts coherent simplification sequences from a single input image. Beyond generating cleaner and more focused compositions, our approach enables applications such as content-aware decluttering, semantic layer decomposition, and interactive editing. More broadly, our work suggests that simplification through structured content removal can serve as a practical mechanism for guiding visual interpretation within the photorealistic domain, complementing traditional abstraction methods. View details
    Preview abstract High-volume enterprise service organizations face a persistent challenge in transitioning from reactive support models to proactive, preventative ones. This paper introduces the Agentic Trend-to-Knowledge (ATK) methodology, a novel, autonomous framework designed to address this gap. The ATK methodology employs an AI agent that operates in a recurring, closed loop. It first uses a two-stage process for the autonomous thematic analysis of recent support cases to identify the most significant recurring issue. It then leverages Retrieval-Augmented Generation (RAG) to source relevant institutional knowledge. A key innovation is the agent's adaptive, bimodal response: if relevant knowledge is found, it drafts a proactive communication for human review; if a knowledge gap is detected, it autonomously creates a content creation task for the appropriate team. This transforms the agent from an automation tool into a proactive process owner that creates a virtuous cycle of continuous improvement for both case deflection and knowledge base quality. By automating the entire workflow from insight to action, the ATK framework provides a concrete methodology for shifting from a "human-in-the-loop" to a more strategic "human-on-the-loop" operational paradigm. View details
    Preview abstract We study the d-dimensional knapsack problem. We are given a set of items, each with a d-dimensional cost vector and a profit, along with a d-dimensional budget vector. The goal is to select a set of items that do not exceed the budget in all dimensions and maximize the total profit. A polynomial-time approximation scheme (PTAS) with running time n^{Θ(d/{ε})} has long been known for this problem, where {ε} is the error parameter and n is the encoding size. Despite decades of active research, the best running time of a PTAS has remained O(n^{⌈ d/{ε} ⌉ - d}). Unfortunately, existing lower bounds only cover the special case with two dimensions d = 2, and do not answer whether there is a n^{o(d/({ε)})}-time PTAS for larger values of d. In this work, we show that the running times of the best-known PTAS cannot be improved up to a polylogarithmic factor assuming the Exponential Time Hypothesis (ETH). Our techniques are based on a robust reduction from 2-CSP, which embeds 2-CSP constraints into a desired number of dimensions. Then, using a recent result of [Bafna Karthik and Minzer, STOC'25], we succeed in exhibiting tight trade-off between d and {ε} for all regimes of the parameters assuming d is sufficiently large. Informally, our result also shows that under ETH, for any function f there is no f(d/({ε)}) ⋅ n^{õ(d/({ε)})}-time (1-{ε})-approximation for d-dimensional knapsack, where n is the number of items and õ hides polylogarithmic factors in d/({ε)}. View details
    Phoenix: Rowhammer Attacks on DDR5 with Self-Correcting Synchronization
    Michele Marazzi
    Kaveh Razavi
    Salman Qazi
    Diego Meyer
    Patrick Jattke
    IEEE Security & Privacy (S&P) (2026)
    Preview abstract Object-Counting for remote-sensing (RS) imagery is raising increasing research interest due to its crucial role in a wide and diverse set of applications. While several promising methods for RS object-counting have been proposed, existing methods focus on a closed, pre-defined set of object classes. This limitation necessitates costly re-annotation and model re-training to adapt current approaches for counting of novel objects that have not been seen during training, and severely inhibits their application in dynamic, real-world monitoring scenarios. To address this gap, in this work we propose RS-OVC - an adaptation of existing work for Open Vocabulary Counting (OVC) approach from general computer vision to the RS domain. We show that our model is capable of accurate counting of novel object classes, that are unseen during training, based solely on textual and/or visual conditioning. View details
    Reasoning-Driven Synthetic Data Generation and Evaluation
    Tim R. Davidson
    Benoit Seguin
    Transactions on Machine Learning Research (2026)
    Preview abstract Although many AI applications of interest require specialized multi-modal models, relevant data to train such models is inherently scarce or inaccessible. Filling these gaps with human annotators is prohibitively expensive, error-prone, and time-consuming, leading model builders to increasingly consider synthetic data as a scalable alternative. However, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms, or extensive seed data from the target distribution — limiting their scalability, explainability, and control. In this paper, we introduce Simula: a novel reasoning-driven framework for data generation and evaluation. It employs a seedless, agentic approach to generate synthetic datasets at scale, allowing users to define desired dataset characteristics through an explainable and controllable process that enables fine-grained resource allocation. We show the efficacy of our approach on a variety of datasets, rigorously testing both intrinsic and downstream properties. Our work (1) offers guidelines for synthetic data mechanism design, (2) provides insights into generating and evaluating synthetic data at scale, and (3) unlocks new opportunities for developing and deploying AI in domains where data scarcity or privacy concerns are paramount. View details
    Preview abstract Generative AI (GenAI) is evolving from standalone tools to interconnected ecosystems that integrate chatbots, cloud platforms, and third-party services. While this ecosystem model enables personalization and extended services, it also introduces complex information flows and amplifies privacy risks. Existing solutions focus on system-level protections, offering little support for users to make meaningful privacy choices. To address this gap, we conducted two vignette-based survey studies with 486 participants and a followup interview study with 16 participants. We also explored users’ needs and preferences for privacy choice design across both GenAI personalization and data-sharing. Our results reveal paradoxical patterns: participants sometimes trusted third-party ecosystems more for personalization but perceived greater control in first-party ecosystems when data was shared externally. We discuss design implications for privacy choice interfaces that enhance transparency, control, and trust in GenAI ecosystems. View details
    TDXRay: Microarchitectural Side-Channel Analysis of Intel TDX for Real-World Workloads
    Tristan Hornetz
    Hosein Yavarzadeh
    Albert Cheu
    Adria Gascon
    Lukas Gerlach
    Michael Schwarz
    Ruiyi Zhang
    IEEE Security & Privacy (S&P) (2026)
    Preview abstract Confidential computing with VM-based trusted execution environments (TEEs) promises to protect code and data from a privileged cloud operator, enabling privacy-preserving workloads ranging from medical analytics to AI inference. However, most deployments exclude microarchitectural side channels from their threat model, shifting the burden to application developers who lack practical, general-purpose tools to assess (let alone mitigate) leakage. This gap is problematic: host-observable effects such as page-fault patterns, shared-cache contention, performance-counter surrogates (where available), and fine-grained timing primitives (e.g., MWAIT) can still reveal high-level secrets even when memory remains encrypted. We present TDXRay, an open-source framework that systematizes the evaluation of side-channel risk for confidential VMs in Intel TDX. TDXRay exposes unified interfaces to exercise and measure several attack primitives—including controlled-channel attacks via page tables, cache-based contention/occupancy probes, performance-counter–derived signals, and timing channels—against unmodified guest workloads. Using TDXRay, we build two end-to-end case studies: (1) a classic AES T-table attack in which a malicious hypervisor recovers the secret key from access-pattern leakage, and (2) an LLaMA inference attack in which the host infers user prompts by monitoring memory accesses during tokenization and embedding lookups. Across both, we show that a host with no direct access to guest memory can reconstruct sensitive information by observing only externalized microarchitectural signals. View details
    DeduBB: Binary Code Size Reduction via Post-Link Basic Block De-duplication
    Chaitanya Mamatha Ananda
    Rajiv Gupta
    Mahbod Afarin
    Han Shen
    LCTES (Languages, Compilers, Tools and Theory of Embedded Systems) (2026) (to appear)
    Preview abstract Binary sizes of newer versions of software applications tend to be larger, primarily due to feature bloat. This poses various challenges, particularly for mobile applications. It affects upgrade rates directly impacting revenues, increases maintenance costs of supporting multiple versions, and prevents some users from getting critical security fixes. Code bloat also poses a problem for large warehouse-scale applications. Such applications experience performance degradation when their code size exceeds what smaller and more efficient code models can handle. In this paper, we introduce a post-link optimization tech nique called DeduBB, which deduplicates basic blocks of an application across procedure boundaries. While prior tech- niques used function outlining to de-duplicate redundant code sequences, it missed out on many opportunities as it cannot handle code that manipulates the program stack. In addition, previous techniques were either limited to the scope of a module or lacked scalable implementations required to handle large warehouse-scale applications. Our technique, DeduBB, handles all types of code duplication as we use a novel save-and-jump code pattern to execute de-duplicated code blocks. In addition, DeduBB has been designed to work on scalable post-link optimizers and can even be applied to large warehouse-scale datacenter applications. Finally, DeduBB is profile-guided and can be applied selectively to infrequently executed cold basic blocks to not affect application performance. In fact, in several cases, the performance of the smaller application binary improves due to reductions in its hot working set size. We have implemented our technique on the state-of-the-art post link optimizers, BOLT and Propeller. Experiments show that we can significantly reduce the code size of several benchmarks by 1.55% to 18.63%, on both Arm and x86 platforms, and on binaries that have already been heavily optimized for size using existing code size reduction features. Furthermore, aided by profiles, our technique can retain more than 80% of the maximal code size savings without affecting performance. View details
    Preview abstract Browser fingerprinting is the practice of tracking users across the Web by collecting attributes from their devices and combining them to create unique identifiers. This practice poses major privacy risks to users, and more than a decade of research has quantified fingerprinting risks due to various attributes, leading browser developers to implement many privacy-enhancing changes. Early work used Shannon entropy to quantify risks. However, Shannon entropy can grow with dataset size, limiting the ability to compare datasets and results. Researchers then introduced normalized entropy as a measure for comparing browser fingerprinting datasets of different sizes and numerous works followed using normalized entropy for this purpose. We identify and address a resulting problem in the fingerprinting literature. We show normalized entropy is ill-suited to compare datasets of different sizes — it decreases as dataset size increases. We show this both analytically and empirically, leveraging a recently published dataset of browser attributes commonly used for fingerprinting. Given the unmet need for a better fingerprinting risk measure, we define a minimal set of desired properties for such a measure: scale-invariance, monotonicity and estimability. We then propose to use Tsallis entropy as a more interpretable fingerprinting risk measure. We evaluate Shannon, normalized, and Tsallis entropy with respect to the properties, and prove that only Tsallis entropy satisfies all of them. View details
    Preview abstract The rapid adoption of agentic systems powered by large language models (LLMs) introduces significant security challenges distinct from plain conversational models, particularly concerning prompt injection and tool misuse due to their dynamic personas and real- world tool interactions. This paper investigates the effectiveness of hardened security prompting in a task-oriented multi-agent framework, using a coding assistant as a representative case study. We com- pare a baseline ”unhardened” agent against a ”hard- ened” version equipped with explicit security guide- lines applied across all sub-agents. Our evaluation across 150+ single-turn and 32 multi-turn attack sce- narios demonstrates that prompt hardening dramat- ically improves resilience. With a simple, approxi- mately 500-token security hardener, single-turn fail- ure rates dropped from 19.48% to 2.60%, while multi- turn failure rates decreased from 75.00% to 46.88%. Furthermore, we show that successfully bypassing the hardened agent requires significantly more adversar- ial effort and a greater number of chat turns. How- ever, the analysis also reveals a critical shift in vul- nerability taxonomy: as direct attacks fail, adver- saries exploit the agent’s core functionality via ”Func- tional Wrappers” (Intent Obfuscation), highlighting a residual risk that necessitates a shift in the defen- sive paradigm from static filters to dynamic runtime state and intent analysis. View details
    ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders
    Guy Tennenholtz
    Jihwan Jeong
    The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL-26), Rabat, Morocco (2026)
    Preview abstract LLM-based user simulators are a scalable solution for improving conversational AI, but a critical realism gap undermines their effectiveness. To close this gap, we introduce a framework for building and validating high-fidelity simulators. We present a novel dataset of human-AI shopping conversations designed to capture a wide spectrum of user experiences. To measure fidelity, we propose a hybrid evaluation protocol that combines statistical alignment with a learned, discriminator-based Human-Likeness Score. Our most sophisticated simulator, trained via reinforcement learning with iterative critique, achieves a significant leap in realism. Critically, we demonstrate through counterfactual validation that our simulator—trained exclusively on optimal interactions—realistically adapts its behavior to suboptimal system responses, mirroring real user reactions and marking a key advance in creating reliable simulators for robust AI development. View details
    ×