Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10092 publications
A data-centric perspective on the information needed for hydrological uncertainty predictions
Andreas Auer
Martin Gauch
Frederik Kratzert
Sepp Hochreiter
Daniel Klotz
Hydrology and Earth System Sciences (2024)
Preview abstract
Uncertainty estimates are fundamental to assess the reliability of predictive models in hydrology. We use the framework of conformal prediction to investigate the impact of temporal and spatial information on uncertainty estimates within hydrological predictions. Integrating recent information significantly enhances overall uncertainty predictions, even with substantial gaps between updates. While local information yields good results on average, it proves to be insufficient for peak-flow predictions. Incorporating global information improves the accuracy of peak-flow bounds, corroborating findings from related studies. Overall, the study underscores the importance of continuous data updates and the integration of global information for robust and efficient uncertainty estimation.
View details
Open Se Cura: First Silicon Results of an Auditable and Transparent Hardware Root of Trust System using Open EDA in 16-nm
Guanchen Tao
Ming-Hung Chen
Bangfei Pan
Kai Yick
Dennis Sylvester
Mehdi Saligane
IEEE Solid-State Circuits Magazine, 16(2024), pp. 58-66
Preview abstract
Hardware root of trust (HRoT) is essential for IoT devices as it provides critical user data protection. However, each novel use case significantly lengthens the development time for an HRoT system. Furthermore, most HRoT solutions are proprietary, and users lack permission to inspect and audit such systems [1] , [2] . This article introduces Open Se Cura, which is an open source framework designed to expedite the implementation of secure and transparent HRoT systems. The platform grants designers the flexibility to choose their preferred electronic design automation (EDA) tools. They can opt for proprietary EDA solutions or select from open source alternatives, including OpenROAD [3] , [4] , using the OpenFASOC framework [5] , [6] . Additionally, the platform supports the use of open source process design kits (PDKs) to present a transparent and auditable approach to hardware–software co-design. This approach enables fast and trustworthy HRoT system implementation and is openly available to reproduce its results and security efficacy [7] . The extended version of the Open Se Cura reference design is showcased through FPGA emulation and its 22-nm ASIC implementation. We finally present the first measurement results of a 16-nm silicon implementation of selected components from OpenTitan, the security RoT hardware building block of Open Se Cura. This work was integrated using OpenFASOC’s modular flow, which allows one to call for open tools, such as OpenROAD, for physical design and closed tools for the missing open source EDA in 16 nm.
View details
USM-SCD: USM-Based Multilingual Speaker Change Detection
Yongqiang Wang
Jason Pelecanos
Yu Zhang
Yiling Huang
Han Lu
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 11801-11805
Preview abstract
We introduce a multilingual speaker change detection model (USM- SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8% speaker change detection F1 score across various public and internal test sets, beating the previous monolingual baseline model by 21% relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.
View details
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Bradley Kim
Alonso Martinez
Yu-Chuan Su
Agrim Gupta
Lu Jiang
Jacob Walker
Neural Information Processing Systems (NeurIPS) (2024) (to appear)
Preview abstract
Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input.
View details
Preview abstract
We present a new task and dataset, ScreenQA, for screen content understanding via question answering. The existing screen datasets are focused either on structure and component-level understanding, or on a much higher-level composite task such as navigation and task completion. We attempt to bridge the gap between these two by annotating 86K question-answer pairs over the RICO dataset in hope to benchmark the screen reading comprehension capacity.
View details
KATch: A Fast Symbolic Verifier for NetKAT
Mark Moeller
Jules Jacobs
Olivier Savary Belanger
David Darais
Cole Schlesinger
Nate Foster
Alexandra Silva
Programming Languages and Implementation (PLDI) (2024) (to appear)
Preview abstract
We develop new data structures and algorithms for checking verification queries in NetKAT, a domain-specific language for specifying the behavior of network data planes. Our results extend the techniques obtained in prior work on symbolic automata and provide a framework for building efficient and scalable verification tools. We present \KATch, an implementation of these ideas in Scala, including extended logical operators that are useful for expressing network-wide specifications and optimizations that construct a bisimulation quickly or generate a counter-example showing that none exists. We evaluate the performance of our implementation on real-world and synthetic benchmarks, verifying properties such as reachability and slice isolation, typically returning a result in well under a second, which is orders of magnitude faster than previous approaches.
View details
Density-based User Representation through Gaussian Process Regression for Multi-interest Personalized Retrieval
Haolun Wu
Xue Liu
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
Personalized recommendation systems are increasingly essential in our information-rich society, aiding users in navigating the expansive online realm. However, accurately modeling the diverse and dynamic interests of the users remains a formidable challenge. Existing user modeling methods, like Single-point User Representation (SUR) and Multi-point User Representation (MUR), have their limitations in terms of accuracy, diversity, computation cost, and adaptability. To overcome these challenges, we introduce a novel model, the Density-based User Representation (DUR), leveraging Gaussian Process Regression (GPR), which has not been extensively explored in multi-interest recommendation and retrieval. Our approach inherently captures user interest dynamics without manual tuning, provides uncertainty-awareness, and is more efficient than point-based representation methods. This paper outlines the development and implementation of GPR4DUR, details its evaluation protocols, and presents extensive analysis demonstrating its effectiveness and efficiency. Experiments on real-world offline datasets confirm our method’s adaptability and efficiency. Further online experiments simulating user behavior illuminate the benefits of our method in the exploration-exploitation trade-off by effectively utilizing model uncertainty.
View details
SAC124 - SSAC Advice on Name Collision Analysis
Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2024), pp. 15
Preview abstract
In this document the Security and Stability Advisory Committee (SSAC) provides its analysis of
the findings and recommendations presented within the Name Collision Analysis Project
(NCAP) Study Two and the proposed Name Collision Risk Assessment Framework. The SSAC
also provides additional commentary on several aspects of the NCAP Study Two Report and
makes recommendations to the ICANN Board.
View details
Preview abstract
On the Boolean domain, there is a class of symmetric signatures called “Fibonacci gates” for which a beautiful P-time combinatorial algorithm has been designed for the corresponding Holant problems.
In this work, I give a combinatorial view for Holant(F) problems on a domain of size 3 where F is a set of arity 3 functions with inputs taking values on the domain of size 3 and the functions share some common properties. The combinatorial view can also be extended to the domain
of size 4.
Specifically, I extend the definition of “Fibonacci gates” to the domain of size 3 and the domain of size 4. Moreover, I give the corresponding combinatorial algorithms.
View details
Preview abstract
Stereotypes are oversimplified beliefs and ideas about particular groups of people. These cognitive biases are omnipresent in our language, reflected in human-generated dataset and potentially learned and perpetuated by language technologies. Although mitigating stereotypes in language technologies is necessary for preventing harms, stereotypes can impose varying levels of risks for targeted individuals and social groups by appearing in various contexts. Technical challenges in detecting stereotypes are rooted in the societal nuances of stereotyping, making it impossible to capture all intertwined interactions of social groups in diverse cultural context in one generic benchmark. This paper delves into the nuances of detecting stereotypes in an annotation task with humans from various regions of the world. We iteratively disambiguate our definition of the task, refining it as detecting ``generalizing language'' and contribute a multilingual, annotated dataset consisting of sentences mentioning a wide range of social identities in 9 languages and labeled on whether they make broad statements and assumptions about those groups. We experiment with training generalizing language detection models, which provide insight about the linguistic context in which stereotypes can appear, facilitating future research in addressing the dynamic, social aspects of stereotypes.
View details
Preview abstract
WindowMirror is a framework for using XR headsets in productivity scenarios. The toolkit provides users with a simulated, extended screen real-estate. It allows users to interact with multiple desktop applications in real-time within a XR environment. Our architecture has two main modules: one a Unity package and a Python backend, which makes it easy to use and extend. WindowMirror supports traditional desktop interaction methods such as mouse, keyboard, and hand tracking. Furthermore, it features a Cylindrical Window Layout, an emerging design pattern which is particularly effective for single-user, egocentric perspectives. The introduction of WindowMirror aims to set a foundation for future research in XR screen-focused productivity scenarios.
View details
Computational Methodologies for Understanding, Automating, and Evaluating User Interfaces
Yuwen Lu
Yue Jiang
Christof Lutteroth
Toby Jia-Jun Li
Jeffery Nichols
Wolfgang Stuerzlinger
Preview abstract
Building on the success of the first two workshops on user interfaces (UIs) at CHI 2022 and CHI 2023, this workshop aims to advance the research field by further exploring current research trends, such as applying large language models and visual language models. Previous work has explored computational approaches to understanding and adapting UIs using constraint-based optimization models and machine learning-based data-driven approaches. In addition to further delving into these established UI research areas, we aim to trigger the exploration into the application of the latest advancements in general-purpose large language and vision-language models within the UI domain. We will encourage participants to explore novel methods for understanding, automating, and evaluating UIs. The proposed workshop seeks to bring together academic researchers and industry practitioners interested in computational approaches for UIs to discuss the needs and opportunities for future user interface algorithms, models, and applications.
View details
SAC125 - SSAC Report on Registrar Nameserver Management
Gautam Akiwate
Tim April
kc claffy
Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2024), pp. 56
Preview abstract
During domain registration, a minimum of two nameservers are typically required, and this
remains a requirement for any future updates to the domain. Often, domains are delegated to
nameservers that are subordinate to some other domains, creating inter-domain dependencies.
This network of dependencies creates a scenario where the functionality of a domain depends
on the operational status of another domain. This setup lacks contractual or procedural
safeguards against disruption or misuse, especially when the nameserver parent domain expires.
Most registries forbid deleting an expired domain if other domains depend on it for name
resolution. These constraints aim to prevent disruptions in DNS resolution for the dependent
domains. However, this also means that the expired domain remains in a liminal state, neither
fully operational nor completely removed. When registrars cannot delete expired domains with
dependents, they are forced to bear the burden of sponsoring the domain without remuneration
from the registrant. A peer-reviewed study, "Risky BIZness: Risks derived from Registrar Name
Management," observed that some registrars have found and utilized a loophole to these
constraints by renaming the host objects that are subordinate to the expiring domain.1 Once
renamed, the host objects are what Akiwate et al.—and subsequently the SSAC—refers to as
sacrificial nameservers.
This report focuses on a specific type of sacrificial nameserver where the parent domains of the renamed host objects are considered to be unsafe because they are registrable. Registrable
parent domains of sacrificial nameservers introduce a new attack surface for domain resolution
hijacking, as malicious actors can exploit unsafe sacrificial nameservers to gain unauthorized
control over the dependent domains, leading to manipulation or disruption. Unlike traditional
domain hijacking techniques that exploit compromised account credentials or manipulate the
resolution protocol, this report focuses on this unforeseen risk arising from a longstanding
practice employed by some registrars.
In this report, the SSAC explores potential solutions to remediate exposed domains and prevent
the creation of new unsafe sacrificial nameservers. The SSAC examines each proposed solution for its feasibility, effectiveness, and potential to reduce the attack surface without introducing undue complexity or new vulnerabilities into the DNS ecosystem.
View details
Embedding-Aligned Language Models
Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS-24), Vancouver (2024)
Preview abstract
We propose a novel approach for training large language models (LLMs) to adhere to objectives imposed by a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. An Embedding-Aligned Guided LanguagE (EAGLE) agent it trained using a significantly smaller language model to iteratively stir the LLM's generation towards optimal regions of a latent embedding space, given some predefined criteria. We demonstrate the effectiveness of the EAGLE agent using the MovieLens 25M dataset, on extrapolation tasks for content gap to satisfy latent user demand, and multi-attribute satisfaction for generating creative variations of entities. Our work paves the way for controlled and grounded text generation using LLMs, ensuring consistency with domain-specific knowledge and data representations.
View details
Multimodal Modeling for Spoken Language Identification
Shikhar Bharadwaj
Sriram (Sri) Ganapathy
Sid Dalmia
Wei Han
Yu Zhang
Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)
Preview abstract
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance. Conventionally, it is modeled as a speech-based language identification task. Prior techniques have been constrained to a single modality; however in the case of video data there is a wealth of other metadata that may be beneficial for this task. In this work, we propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification. Our study reveals that metadata such as video title, description and geographic location provide substantial information to identify the spoken language of the multimedia recording. We conduct experiments using two diverse public datasets of YouTube videos, and obtain state-of-the-art results on the language identification task. We additionally conduct an ablation study that describes the distinct contribution of each modality for language recognition.
View details