Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11319 publications
    Preview abstract We study the d-dimensional knapsack problem. We are given a set of items, each with a d-dimensional cost vector and a profit, along with a d-dimensional budget vector. The goal is to select a set of items that do not exceed the budget in all dimensions and maximize the total profit. A polynomial-time approximation scheme (PTAS) with running time n^{Θ(d/{ε})} has long been known for this problem, where {ε} is the error parameter and n is the encoding size. Despite decades of active research, the best running time of a PTAS has remained O(n^{⌈ d/{ε} ⌉ - d}). Unfortunately, existing lower bounds only cover the special case with two dimensions d = 2, and do not answer whether there is a n^{o(d/({ε)})}-time PTAS for larger values of d. In this work, we show that the running times of the best-known PTAS cannot be improved up to a polylogarithmic factor assuming the Exponential Time Hypothesis (ETH). Our techniques are based on a robust reduction from 2-CSP, which embeds 2-CSP constraints into a desired number of dimensions. Then, using a recent result of [Bafna Karthik and Minzer, STOC'25], we succeed in exhibiting tight trade-off between d and {ε} for all regimes of the parameters assuming d is sufficiently large. Informally, our result also shows that under ETH, for any function f there is no f(d/({ε)}) ⋅ n^{õ(d/({ε)})}-time (1-{ε})-approximation for d-dimensional knapsack, where n is the number of items and õ hides polylogarithmic factors in d/({ε)}. View details
    Preview abstract Here’s a thought experiment. Say I wave a magic wand across a codebase and an entire class of technical debt, poof, goes away and immediately evaporates if introduced in the future. For example, maybe I make it so that dead feature flags are simply no longer a problem: they just delete themselves as soon as the engineer wills it. Or maybe large-scale migrations just migrate themselves. Maybe we magically have 100% test coverage, without an engineer lifting a finger. What will happen to developer productivity? Surely, developer productivity increases overall. But will the productivity metrics that we all use as a proxy for “developer productivity” move up and to the right. Let’s explore this idea. View details
    Preview abstract Online video platforms face an exponential challenge in detecting and mitigating the flood of AI-generated "slop" and synthetic spam perpetuated by coordinated malicious actors. This content is increasingly designed to exploit the limitations of traditional media forensics, often utilizing generative AI to produce unique, localized variations of harmful or low-quality material at scale. Traditional content-centric moderation fails against this coordinated, adversarial generation strategy. This paper presents a novel, scalable defense system deployed at a major Online Video Platform (OVP) to identify and terminate clusters of coordinated accounts exhibiting a prevalence of adversarial synthetic content. The approach leverages a multi-faceted architecture incorporating two core machine learning components: a robust Coordinated Bot-Net Detector (via Account Relatedness) and a Synthetic Pattern Classifier (formerly BT Classifier). Crucially, we introduce an advanced AI enhancement layer utilizing Large Language Models (LLMs), specialized via Low-Rank Adaptation (LoRA) and Automatic Prompt Optimization (APO), to achieve rapid, high-precision semantic understanding of emerging synthetic spam trends. Operational data spanning a six-month period demonstrates the system's significant impact, resulting in the successful termination of 50K clusters comprising 130K channels of synthetic spam generators. Furthermore, the LLM-driven automation significantly improves operational efficiency, saving approximately 83 human review hours to cut down human reviews by 50%. This work details a critical, deployed solution that provides essential scalability and adversarial resilience against sophisticated generative attacks. View details
    Usability Hasn’t Peaked: Exploring How Expressive Design Overcomes the Usability Plateau
    Alyssa Sheehan
    Bianca Gallardo
    Ying Wang
    Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), April 13–17, 2026, Barcelona, Spain (2026)
    Preview abstract Critics have argued that mobile usability has largely been optimized, and that only incremental gains are possible. We set out to explore if the newest generation of design systems, which promote greater flexibility and a return to design basics, could produce substantially more usable designs while maintaining or increasing aesthetic judgments. Through a study with 48 diverse participants completing tasks in 10 different applications, we found that in designs created following Material 3 Expressive guidelines, users fixated on the correct screen element for a task 33% faster, completed tasks 20% faster, and rated experiences more positively compared to versions designed using the previous Material design system. These improvements in performance and aesthetic ratings challenge the premise of a usability plateau and show that mobile usability has not peaked. We illustrate specific opportunities to make mobile experiences more usable by returning to design fundamentals while highlighting risks of added flexibility. View details
    Preview abstract Generative AI’s humanlike qualities are driving its rapid adoption in professional domains. However, this anthropomorphic appeal raises concerns from HCI and responsible AI scholars about potential hazards and harms, such as overtrust in system outputs. To investigate how technology workers navigate these humanlike qualities and anticipate emergent harms, we conducted focus groups with 30 professionals across six job functions (ML engineering, product policy, UX research and design, product management, technology writing, and communications). Our findings reveal an unsettled knowledge environment surrounding humanlike generative AI, where workers’ varying perspectives illuminate a range of potential risks for individuals, knowledge work fields, and society. We argue that workers require comprehensive support, including clearer conceptions of “humanlikeness” to effectively mitigate these risks. To aid in mitigation strategies, we provide a conceptual map articulating the identified hazards and their connection to conflated notions of “humanlikeness.” View details
    ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding
    Sunny Rajagopalan
    Alireza Golestaneh
    Shubhra Chandra
    Min Zhou
    Jonathan Vronsky
    Songbai Yan
    2026
    Preview abstract We present ALF (Advertiser Large Foundation model), a multi-modal transformer architecture for understanding advertiser behavior and intent across text, image, video and structured data modalities. Through contrastive learning and multi-task optimization, ALF creates unified advertiser representations that capture both content and behavioral patterns. Our model achieves state-of-the-art performance on critical tasks including fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces false positives by 90\% while maintaining 99.8\% precision on abuse detection tasks. The architecture's effectiveness stems from its novel combination of multi-modal transformations, intersample attention mechanism, spectrally normalized projections, and calibrated probabilistic outputs. View details
    Compact Conformal Subgraphs
    Kamesh Munagala
    Aravindan Vijayaraghavan
    ICML (2026)
    Preview abstract Conformal prediction provides rigorous uncertainty guarantees for model outputs but can produce prohibitively large prediction sets in structured domains such as routing, planning, or sequential recommendation. We introduce \emph{graph-based conformal compression}, a framework for constructing compact subgraphs that preserve the statistical validity of conformal prediction while reducing structural complexity. We study a formulation that selects a smallest subgraph capturing a prescribed fraction of conditional probability mass, and reduce to a weighted version of densest $k$-subgraphs in hypergraphs, in the regime where the subgraph has a large fraction of edges. We design efficient approximation algorithms that achieve constant factor coverage and size trade-offs. Our results highlight an algorithmic regime, distinct from classical densest-$k$-subgraph hardness settings, where the problem can be approximated efficiently, bridging conformal prediction with combinatorial graph compression. We finally validate our algorithmic approach on synthetic and real-world instances of trip planning and navigation, showing in each case that our approach handily beats natural baselines. View details
    Robust Wireless Resource Allocation Against Adversarial Jamming
    Christos Tsoufis
    Dionysia Triantafyllopoulou
    Klaus Moessner
    ICC (2026)
    Preview abstract We study the problem of allocating access point bandwidth to users of a wireless network in the presence of adversarial jamming. Specifically, we consider a setting in which the network designer acts first and allocates access point bandwidth to the users of the network, before an adversary applies a jamming strategy to reduce the bandwidth of a subset (or all) of the access points. We consider a strong adversary who has complete information and can optimize the jamming strategy, subject to power budget constraints. In turn, the network designer must allocate the resources in anticipation of the adversary's actions. We explain that our model gives rise to a special network interdiction model, which differs from the standard setting in two ways: The first is that the interdictor is given the benefit of responding, rather than leading the game. The second is that the interdiction is fractional and performed at the node level of the network. The interdiction then propagates to all edges incident to the access point. In terms of technical results, we provide an allocation algorithm that is based on linear programming duality and show that the algorithm can solve the problem optimally, assuming knowledge of the adversary's budget constraints. We conduct experiments on synthetic data to show the extent to which the algorithm improves the total utilized bandwidth over the algorithm that optimizes bandwidth allocation while being oblivious to the adversary's existence. View details
    Preview abstract While non-verbal behaviors and expressive movements are essential for natural human-robot interaction, existing methods often overlook a crucial element: the human’s internal cognitive state. Consequently, proactive multi-agent systems frequently interrupt humans at inopportune moments, leading to cognitive overload and decreased task performance. This paper introduces a framework for generating “cognitively aligned” multi-agent interactions, enhancing the ability of robotic systems to contextually defer communications during moments of high human mental workload. We present the design and implementation of a closed-loop architecture that explores the interplay between autonomous task execution and real-time neurophysiological focus. Utilizing a consumer-grade Brain-Computer Interface (BCI), our approach continuously monitors Electroencephalography (EEG) spectral band powers while a human performs a cognitive-load-inducing task. We propose a workload-driven pipeline where an HTTP-based signaling mechanism places a primary agent’s sensory inputs and audio outputs into a holding state upon detecting high cognitive load. This allows secondary agents to seamlessly process complex, delegated tasks in the background. Once the human’s cognitive state returns to a baseline, the primary agent releases the queued agent message. Our preliminary results demonstrate the feasibility of leveraging real-time signal processing, Large Language Models (LLMs), and physical robotic embodiments to create interrupt-aware, non-intrusive multi-agent systems. View details
    Preview abstract This defensive publication describes a framework for multi-artificial intelligence (AI) orchestration that can be used to address potential limitations associated with reliance on single AI models, such as correlated systemic failures or cognitive blind spots. The described system is a cognitive orchestration framework that can function as a middleware layer to manage tasks across a heterogeneous ensemble of AI models. An orchestrator node can decompose a user request into a sequence of sub-tasks, which an arbitrage engine may then dynamically assign to suitable AI models based on certain factors, such as capability, cost, and latency. For certain tasks, such as those designated as high-risk, a byzantine consensus layer can route the task to multiple diverse models in parallel and may trigger a process, for example a 'cognitive debate,' which could be adjudicated by a third-party judge model to help resolve conflicting outputs. This framework can facilitate a more resilient system that may improve the accuracy and reliability of outputs when compared to some single-model architectures. View details
    Preview abstract Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning smaller models, often sacrifices interpretability while introducing significant resource and operational overhead. To address these limitations, we introduce Prompt-Level Distillation (PLD). We extract explicit reasoning patterns from a Teacher model and organize them into a structured list of expressive instructions for the Student model's System Prompt. Evaluated on the StereoSet and Contract-NLI datasets using Gemma-3 4B, PLD improved Macro F1 scores from 57\% to 90.0\% and 67\% to 83\% respectively, enabling this compact model to match frontier performance with negligible latency overhead. These expressive instructions render the decision-making process transparent, allowing for full human verification of logic, making this approach ideal for regulated industries such as law, finance, and content moderation, as well as high-volume use cases and edge devices. View details
    Preview abstract Audio Description ( AD) provides essential access to visual media for blind and low vision ( BLV) audiences. Yet current AD production tools remain largely inaccessible to BLV video creators, who possess valuable expertise but face barriers due to visually- driven interfaces. We present ADCanvas, a multimodal authoring system that supports non- visual control over audio description ( AD) creation. ADCanvas combines conversational interaction with keyboard- based playback control and a plain- text, screen reader– accessible editor to support end- to- end AD authoring and visual question answering ( VQA). Combining screen- reader- friendly controls with a multimodal LLM agent, ADCanvas supports live VQA, script generation, and AD modification. Through a user study with 12 BLV video creators, we find that users adopt the conversational agent as an informational aide and drafting assistant, while maintaining agency through verification and editing. For example, participants saw themselves as curators who received information from the model and filtered it down for their audience. Our findings offer design implications for accessible media tools, including precise editing controls, accessibility support for creative ideation, and configurable rules for human- AI collaboration. View details
    Preview abstract The field of Human-Computer Interaction is approaching a critical inflection point, moving beyond the era of static, deterministic systems into a new age of self-evolving systems. We introduce the concept of Adaptive generative interfaces that move beyond static artifacts to autonomously expand their own feature sets at runtime. Rather than relying on fixed layouts, these systems utilize generative methods to morph and grow in real-time based on a user’s immediate intent. The system operates through three core mechanisms: Directed synthesis (generating new features from direct commands), Inferred synthesis (generating new features for unmet needs via inferred commands), and Real-time adaptation (dynamically restructuring the interface's visual and functional properties at runtime). To empirically validate this paradigm, we executed a within-subject (repeated measures) comparative study (N=72) utilizing 'Penny,' a digital banking prototype. The experimental design employed a counterbalanced Latin Square approach to mitigate order effects, such as learning bias and fatigue, while comparing Deterministic interfaces baseline against an Adaptive generative interfaces. Participant performance was verified through objective screen-capture evidence, with perceived usability quantified using the industry-standard System Usability Scale (SUS). The results demonstrated a profound shift in user experience: the Adaptive generative version achieved a System Usability Scale (SUS) score of 84.38 ('Excellent'), significantly outperforming the Deterministic version’s score of 53.96 ('Poor'). With a statistically significant mean difference of 30.42 points (p < 0.0001) and a large effect size (d=1.04), these findings confirm that reducing 'navigation tax' through adaptive generative interfaces directly correlates with a substantial increase in perceived usability. We conclude that deterministic interfaces are no longer sufficient to manage the complexity of modern workflows. The future of software lies not in a fixed set of pre-shipped features, but in dynamic capability sets that grow, adapt, and restructure themselves in real-time to meet the specific intent of the user. This paradigm shift necessitates a fundamental transformation in product development, requiring designers to transcend traditional, linear workflows and evolve into 'System Builders'—architects of the design principles and rules that facilitate this new age of self-evolving software. View details
    Preview abstract Validating conversational artificial intelligence (AI) for regulated medical software applications may present challenges, as static test datasets and manual review may be limited in identifying emergent, conversational anomalies. A multi-agent AI system may be configured in a closed-loop for automated validation. The system can, for example, utilize an end user persona simulator agent to generate prompts for a target model and a domain /regulatory expert adjudicator agent to evaluate the target model’s responses against a configurable rubric. A meta-analysis agent can analyze anomalies to identify underlying vulnerabilities, which may then be used to programmatically synthesize new adversarial personas. This adaptive process can generate evidence to support regulatory compliance and continuous performance monitoring for medical software algorithms systems. View details
    Progressive Photorealistic Simplification
    Adi Rosenthal
    Yedid Hoshen
    Arik Shamir
    2026
    Preview abstract Existing image simplification techniques often rely on Non-Photorealistic Rendering (NPR), transforming photographs into stylized sketches, cartoons, or paintings. While effective at reducing visual complexity, such approaches typically sacrifice photographic realism. In this work, we explore a complementary direction: simplifying images while preserving their photorealistic appearance. We introduce progressive semantic image simplification, a framework that iteratively reduces scene complexity by removing and inpainting elements in a controlled manner. At each step, the resulting image remains a plausible natural photograph. Our method combines semantic understanding with generative editing, leveraging Vision-Language Models (VLMs) to identify and prioritize elements for removal, and a learned verifier to ensure photorealism and coherence throughout the process. This is implemented via an iterative \emph{Select–Remove–Verify} pipeline that produces high-quality simplification trajectories. To improve efficiency, we further distill this process into an image-to-video generation model that directly predicts coherent simplification sequences from a single input image. Beyond generating cleaner and more focused compositions, our approach enables applications such as content-aware decluttering, semantic layer decomposition, and interactive editing. More broadly, our work suggests that simplification through structured content removal can serve as a practical mechanism for guiding visual interpretation within the photorealistic domain, complementing traditional abstraction methods. View details
    ×