Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 10467 publications
    Preview abstract While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management. View details
    The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
    Pratik Fegade
    Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (2025), pp. 5-17
    Preview abstract A chief enabler of large-scale deep learning is the distribution of computation across multiple interconnected hardware accelerators. In order to unlock the maximum possible performance, a compiler must first select a reasonable strategy to parallelize a model's operations. Since neural network architectures admit multiple flavors of parallelism, determining the proper strategy for each instruction is a critical (albeit non-trivial) task. To solicit new ideas toward solving this challenging combinatorial optimization problem, we organized the ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning, a multi-month competition focused on advancing the state-of-the-art for model partitioning algorithms. In this paper, we offer a retrospective of this event, including the basic problem formulation, key challenges & opportunities, our new benchmark suite, and the quality of submissions received. View details
    Scalability of Generative AI Models: Challenges and Opportunities in Large-Scale Data Generation and Training
    International Journal of Computer Science and Information Technology Research (IJCSITR) (2025)
    Preview abstract Scalability of Generative AI Models: Challenges and Opportunities in Large-Scale Data Generation and Training View details
    Preview abstract Cloud platforms have been virtualizing storage devices like flash-based solid-state drives (SSDs) to make effective use of storage resources. They enable either software-isolated instance or hardware-isolated instance for facilitating the storage sharing between multi-tenant applications. However, for decades, they have to combat the fundamental tussle between the performance isolation and resource utilization. They suffer from either long tail latency caused by weak isolation or low storage utilization caused by strong isolation. In this paper, we present FleetIO, a learning-based storage virtualization framework that employs reinforcement learning (RL) for managing virtualized SSDs. FleetIO explores the unique features of RL to handle the dynamic changes of application workloads and storage states, and integrates the storage scheduling into the RL decision-making process. It achieves both performance isolation and improved storage utilization by enabling dynamic fine-grained storage harvesting across co-located application instances, while minimizing its negative impact on their service-level objectives (SLOs). FleetIO clusters workloads into different types (e.g., latency-sensitive and bandwidth-intensive) based on the collected I/O traces at runtime, and fine-tunes the RL reward functions for each type of workloads. We implement FleetIO on a real programmable SSD board and evaluate it with diverse cloud applications. We show that FleetIO improves the overall storage utilization of the shared SSD by up to 1.4×, and decreases the tail latency of I/O requests by 1.5× on average, compared to the state-of-the-art storage sharing approaches. View details
    Preview abstract Delay monitoring is a commonly arising problem in applications such as queue management systems, scheduling, and traffic monitoring. Motivated by such applications, we formulate a queue monitoring problem, where there is a FIFO queue with arbitrary arrivals and departures, and a server needs to monitor the length of a queue by using (decentralized) pings from packets in the queue. Packets can send pings informing the server about the number of packets ahead of them in the queue. Via novel online algorithms and lower bounds, we tightly characterize the trade-off between the number of pings sent and the accuracy of the server's real time estimates. Further, our approximate estimates can be made to be accurate to an arbitrary precision. View details
    Preview abstract We discuss the challenges posed by growing machine learning workloads on datacenter networks and present how Google’s Jupiter network fabrics effectively support diverse traffic. View details
    Preview abstract Many AI applications of interest require specialized multi-modal models. Yet, relevant data for training these models is inherently scarce. Human annotation is prohibitively expensive, error-prone, and time-consuming. Meanwhile, existing synthetic data generation methods often rely on manual prompts, evolutionary algorithms, or extensive seed data from the target distribution - limiting scalability and control. In this paper, we introduce Simula, a novel, seedless framework that balances global and local reasoning to generate synthetic datasets. We utilize taxonomies to capture a global coverage space and use a series of agentic refinements to promote local diversity and complexity. Our approach allows users to define desired dataset characteristics through an explainable and controllable process, without relying on seed data. This unlocks new opportunities for developing and deploying AI in domains where data scarcity or privacy concerns are paramount. View details
    Preview abstract We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags, and only aggregate label values in each bag are available. Despite the partial observability, the goal is still to achieve small regret at the level of individual examples. We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal. From an algorithmic viewpoint, we rely on carefully designed variants of Empirical Risk Minimization, and Stochastic Gradient Descent algorithms, combined with ad hoc variance reduction techniques. On one hand, our theoretical results improve in important ways on the existing literature on LLP, specifically in the way the sample complexity depends on the bag size. On the other hand, we validate our algorithmic solutions on several datasets, demonstrating improved empirical performance (better accuracy for less samples) against recent baselines. View details
    AI as a Catalyst for Educational Equity: Addressing Global Teacher Shortages and Learning Disparities
    International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCERT) (2025)
    Preview abstract The global education system is grappling with a critical shortage of teachers, threatening the achievement of universal quality education. This article examines how artificial intelligence (AI) technologies can revolutionize educational access and equity by addressing these systemic challenges. Through a comprehensive article analysis of AI-enabled solutions, including personalized learning mechanisms, virtual tutoring systems, and intelligent content distribution platforms, the article explores the transformative potential of these technologies in democratizing education. The article investigates the implementation of AI across established educational platforms, examining their effectiveness in providing adaptive learning experiences, breaking down language barriers, and ensuring cultural relevance. The article demonstrates that strategic AI integration can significantly impact learning outcomes while helping to bridge the global teacher shortage gap. The article also addresses critical implementation challenges, providing policy recommendations and resource allocation frameworks for successful AI adoption in education systems worldwide. This article analysis contributes to the growing body of knowledge on educational technology by offering practical insights into how AI can be leveraged to create more inclusive, effective, and accessible learning environments, ultimately advancing the goal of quality education for all. View details
    Preview abstract Generative AI's potential for hallucinations and inaccuracies are by far the most discussed limitation in AI-assisted software development. But, whether developers have other concerns about using generative AI in their coding practice has not been thoroughly explored. This article describes the results of in-depth interviews with developers about their other concerns about generative AI in coding, beyond the tools accuracy, and discusses related policy implications for organizations developing software. View details
    Preview abstract Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), have demonstrated significant potential in clinical reasoning skills such as history-taking and differential diagnosis generation—critical aspects of medical education. This work explores how LLMs can augment medical curricula through interactive learning. We conducted a participatory design process with medical students, residents and medical education experts to co-create an AI-powered tutor prototype for clinical reasoning. As part of the co-design process, we conducted a qualitative user study, investigating learning needs and practices via interviews, and conducting concept evaluations through interactions with the prototype. Findings highlight the challenges learners face in transitioning from theoretical knowledge to practical application, and how an AI tutor can provide personalized practice and feedback. We conclude with design considerations, emphasizing the importance of context-specific knowledge and emulating positive preceptor traits, to guide the development of AI tools for medical education. View details
    Confidence Improves Self-Consistency in LLMs
    Tom Sheffer
    Eran Ofek
    Ariel Goldstein
    Zorik Gekhman
    ACL 2025, Findings (2025)
    Preview abstract Self-consistency decoding enhances LLMs’ performance on reasoning tasks by sampling diverse reasoning paths and selecting the most frequent answer. However, it is computationally expensive, as sampling many of these (lengthy) paths is required to increase the chances that the correct answer emerges as the most frequent one. To address this, we introduce Confidence-Informed Self-Consistency (CISC). CISC performs a weighted majority vote based on confidence scores obtained directly from the model. By prioritizing high-confidence paths, it can identify the correct answer with a significantly smaller sample size. When tested on nine models and four datasets, CISC outperforms self-consistency in nearly all configurations, reducing the required number of reasoning paths by over 40% on average. In addition, we introduce the notion of within-question confidence evaluation, after showing that standard evaluation methods are poor predictors of success in distinguishing correct and incorrect answers to the same question. In fact, the most calibrated confidence method proved to be the least effective for CISC. Lastly, beyond these practical implications, our results and analyses show that LLMs can effectively judge the correctness of their own outputs, contributing to the ongoing debate on this topic. View details
    Binamix -- A Python Library for Generating Binaural Audio Datasets
    Dan Barry
    Davoud Shariat Panah
    Alessandro Ragano
    Andrew Hines
    AES 158th Audio Engineering Society Convention (2025)
    Preview abstract The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions to generate binaural audio data sets for use in testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides Head Related Impulse Response (HRIR) and Binaural Room Impulse Response (BRIR) data for 20 subjects. The Binamix library provides a flexible and repeatable framework for creating large-scale spatial audio datasets, making it an invaluable resource for codec evaluation, audio quality metric development, and machine learning model training. A range of pre-built example scripts, utility functions, and visualization plots further streamline the process of custom pipeline creation. This paper presents an overview of the library’s capabilities, including binaural rendering, impulse response interpolation, and multi-track mixing for various speaker layouts. The tools utilize a modified Delaunay triangulation technique to achieve accurate HRIR/BRIR interpolation where desired angles are not present in the data. By supporting a wide range of parameters such as azimuth, elevation, subject Impulse Responses (IRs), speaker layouts, mixing controls, and more, the library enables researchers to create large binaural datasets for any downstream purpose. Binamix empowers researchers and developers to advance spatial audio applications with reproducible methodologies by offering an open-source solution for binaural rendering and dataset generation. We release the library under the Apache 2.0 License at https://github.com/QxLabIreland/Binamix/ View details
    Reasoning-SQL: Reinforcement Learning with Partial Rewards for Reasoning-Enhanced Text-to-SQL
    Mohammadreza Pourreza
    Shayan Talaei
    Hailong Li
    Azalia Mirhoseini
    Amin Saberi
    Conference on Language Modeling (COLM) (2025) (to appear)
    Preview abstract Text-to-SQL is a challenging task involving multiple reasoning-intensive subtasks, including natural language understanding, database schema comprehension, and precise SQL query formulation. Existing approaches often rely on handcrafted reasoning paths with inductive biases that can limit their overall effectiveness. Motivated by the recent success of reasoning-enhanced models such as DeepSeek R1 and OpenAI o1, which effectively leverage reward-driven self-exploration to enhance reasoning capabilities and generalization, we propose a novel set of partial rewards tailored specifically for the Text-to-SQL task. Our reward set includes schema-linking, AI feedback, n-gram similarity, and syntax check, explicitly designed to address the reward sparsity issue prevalent in reinforcement learning (RL). Leveraging group relative policy optimization (GRPO), our approach explicitly encourages large language models (LLMs) to develop intrinsic reasoning skills necessary for accurate SQL query generation. With models of different sizes, we demonstrate that RL-only training with our proposed rewards consistently achieves higher accuracy and superior generalization compared to supervised fine-tuning (SFT). Remarkably, our RL-trained 14B-parameter model significantly outperforms larger proprietary models, e.g. o3-mini by 4% and Gemini-1.5-Pro-002 by 3% on the BIRD benchmark. These highlight the efficacy of our proposed RL-training framework with partial rewards for enhancing both accuracy and reasoning capabilities in Text-to-SQL tasks. View details
    Society-Centric Product Innovation in the Era of Customer Obsession
    International Journal of Science and Research Archive (IJSRA), Volume 14 - Issue 1 (2025)
    Preview abstract This article provides a comprehensive analysis of the evolving landscape of innovation in the technology sector, with a focus on the intersection of technological progress and social responsibility. The article explores key challenges facing the industry, including public trust erosion, digital privacy concerns, and the impact of automation on workforce dynamics. It investigates responsible innovation frameworks' emergence and implementation across various organizations, highlighting the transformation from traditional development approaches to more society-centric models. The article demonstrates how companies balance innovation speed with social responsibility, incorporate ethical considerations into their development processes, and address digital disparities across different demographics. By examining how companies balance the pace of innovation with ethical responsibilities, integrate social considerations into their processes, and address digital inequities across diverse demographics, the article underscores the transformative potential of these frameworks. Through insights into cross-functional teams, impact assessment tools, and stakeholder engagement strategies, it demonstrates how responsible innovation drives both sustainable business value and societal progress. View details