Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10133 publications
Preview abstract
The evolution of AI is a pivotal moment in history, but it’s not the first time we have experienced technological advances that have changed how humans work. By looking at the advances in automobiles, we are reminded of the importance of focusing on our developers' needs and goals.
View details
Preview abstract
Graphs are a powerful tool for representing and analyzing complex relationships in real-world applications such as social networks, recommender systems, and computational finance. Reasoning on graphs is essential for drawing inferences about the relationships between entities in a complex system, and to identify hidden patterns and trends. Despite the remarkable progress in automated reasoning with natural text, reasoning on graphs with large language models (LLMs) remains an understudied problem. In this work, we perform the first comprehensive study of encoding graph-structured data as text for consumption by LLMs. We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered. These novel results provide valuable insight on strategies for encoding graphs as text. Using these insights we illustrate how the correct choice of encoders can boost performance on graph reasoning tasks inside LLMs by 4.8% to 61.8%, depending on the task.
View details
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao
Samuel Schulter
Zhixing Zhang
Vijay Kumar B G
Yumin Suh
Manmohan Chandraker
Dimitris N. Metaxas
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Preview abstract
Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distribution changes of PLs. To address these challenges, we propose SAS-Det that tames self-training for OVD from two key perspectives. First, we present a split-and-fusion (SAF) head that splits a standard detection into an open-branch and a closed-branch. This design can reduce noisy supervision from pseudo boxes. Moreover, the two branches learn complementary knowledge from different training data, significantly enhancing performance when fused together. Second, in our view, unlike in closed-set tasks, the PL distributions in OVD are solely determined by the teacher model. We introduce a periodic update strategy to decrease the number of updates to the teacher, thereby decreasing the frequency of changes in PL distributions, which stabilizes the training process. Extensive experiments demonstrate SAS-Det is both efficient and effective. SAS-Det outperforms recent models of the same scale by a clear margin and achieves 37.4 AP50 and 29.1 APr on novel categories of the COCO and LVIS benchmarks, respectively.
View details
Fixing Insecure Cellular System Information Broadcasts For Good
Alex Ross
Bradley Reaves
Yomna Nasser
Gil Cukierman
Roger Piqueras Jover
Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses, Association for Computing Machinery (2024), 693–708
Preview abstract
Cellular networks are essential everywhere, and securing them is increasingly important as attacks against them become more prevalent and powerful. All cellular network generations bootstrap new radio connections with unauthenticated System Information Blocks (SIBs), which provide critical parameters needed to identify and connect to the network. Many cellular network attacks require exploiting SIBs. Authenticating these messages would eliminate
whole classes of attack, from spoofed emergency alerts to fake base stations.
This paper presents Broadcast But Verify, an efficient backwardscompatible mechanism for SIB authentication. Broadcast But Verify specifies a new signing SIB that encodes authentication signatures and hashes for all other SIBs while building on a standard cellular PKI. We identify the security and functional requirements for such a system, define a scalable and flexible mechanism to meet those requirements, and demonstrate negligible common-case connection latency overhead of 3.220ms in a 4G LTE testbed. We also demonstrate that unmodified mobile devices successfully connect to networks deploying Broadcast But Verify. In contrast to prior proposals, Broadcast But Verify authenticates every SIB broadcasted by a cell. By demonstrating that even 4G LTE has the capacity to authenticate SIBs, we argue that future network generations can and should mandate authenticated SIBs.
View details
Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?
Minghan Li
Jimmy Lin
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24) (2024)
Preview abstract
Query expansion has been widely used to improve the search results of first-stage retrievers, yet its influence on second-stage, crossencoder rankers remains under-explored. A recent study shows that current expansion techniques benefit weaker models but harm stronger rankers. In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers? To answer this question, we first apply popular query expansion methods to different crossencoder rankers and verify the deteriorated zero-shot effectiveness. We identify two vital steps in the experiment: high-quality keyword generation and minimally-disruptive query modification. We show that it is possible to improve the generalization of a strong neural ranker, by generating keywords through a reasoning chain and aggregating the ranking results of each expanded query via selfconsistency, reciprocal rank weighting, and fusion. Experiments on BEIR and TREC Deep Learning 2019/2020 show that the nDCG@10 scores of both MonoT5 and RankT5 following these steps are improved, which points out a direction for applying query expansion to strong cross-encoder rankers.
View details
Preview abstract
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon.
View details
FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces
Safa C. Medin
Gengyan Li
Stephan Garbin
Philip Davidson
Gregory W. Wornell
Thabo Beeler
Abhimitra Meka
Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7 (2024), pp. 1-17
Preview abstract
3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts---photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- and hardware, and allows for a graceful trade-off between quality and efficiency. Our method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. We achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. We export a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. We demonstrate our method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates.
View details
Neural general circulation models for weather and climate
Dmitrii Kochkov
Janni Yuval
Jamie Smith
Griffin Mooers
Milan Kloewer
James Lottes
Peter Dueben
Samuel Hatfield
Peter Battaglia
Alvaro Sanchez
Matthew Willson
Nature, 632 (2024), pp. 1060-1066
Preview abstract
General circulation models (GCMs) are the foundation of weather and climate prediction. GCMs are physics-based simulators that combine a numerical solver for large-scale dynamics with tuned representations for small-scale processes such as cloud formation. Recently, machine-learning models trained on reanalysis data have achieved comparable or better skill than GCMs for deterministic weather forecasting. However, these models have not demonstrated improved ensemble forecasts, or shown sufficient stability for long-term weather and climate simulations. Here we present a GCM that combines a differentiable solver for atmospheric dynamics with machine-learning components and show that it can generate forecasts of deterministic weather, ensemble weather and climate on par with the best machine-learning and physics-based methods. NeuralGCM is competitive with machine-learning models for one- to ten-day forecasts, and with the European Centre for Medium-Range Weather Forecasts ensemble prediction for one- to fifteen-day forecasts. With prescribed sea surface temperature, NeuralGCM can accurately track climate metrics for multiple decades, and climate forecasts with 140-kilometre resolution show emergent phenomena such as realistic frequency and trajectories of tropical cyclones. For both weather and climate, our approach offers orders of magnitude computational savings over conventional GCMs, although our model does not extrapolate to substantially different future climates. Our results show that end-to-end deep learning is compatible with tasks performed by conventional GCMs and can enhance the large-scale physical simulations that are essential for understanding and predicting the Earth system.
View details
Preview abstract
The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space. Our approach allows diffusion models to learn a more disentangled latent space, which enables smoother interpolation, more accurate inversion, and more precise control over attributes directly in the latent space. Extensive experiments illustrate advantages of the proposed method in image interpolation, image inversion, and linear editing.
View details
Preview abstract
Sequence labeling is a core task in text understanding for IE/IR systems. Text generation models have increasingly become the go-to solution for such tasks (e.g., entity extraction and dialog slot filling). While most research has focused on the labeling accuracy, a key aspect -- of vital practical importance -- has slipped through the cracks: understanding model confidence. More specifically, we lack a principled understanding of how to reliably gauge the confidence of a model in its predictions for each labeled span. This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling. Most notably, we find that simply using the decoder's output probabilities is not the best in realizing well-calibrated confidence estimates. As verified over six public datasets of different tasks, we show that our proposed approach -- which leverages statistics from top-k predictions by a beam search -- significantly reduces calibration errors of the predictions of a generative sequence labeling model.
View details
Understanding metric-related pitfalls in image analysis validation
Annika Reinke
Lena Maier-Hein
Paul Jager
Shravya Shetty
Understanding Metrics Workgroup
Nature Methods (2024)
Preview abstract
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
View details
On the Robustness of Image-based Malware Detection against Adversarial Attacks
Yassine Mekdad
Harun Oz
Ahmet Aris
Leonardo Babun
Faraz Naseem
Selcuk Uluagac
Nasir Ghani
Abbas Acar
Network Security Empowered by Artificial Intelligence, Springer (2024)
Preview abstract
Machine and deep learning models are now one of the most valuable tools in the arsenal of computer security practitioners. Their success has been demonstrated in various network-security-oriented applications such as intrusion detection, cyber threat intelligence, vulnerability discovery, and malware detection. Nevertheless, recent research studies have shown that crafted adversarial samples can be used to evade malware detection models. Even though several defense mechanisms such as adversarial training have been proposed in the malware detection domain to address this issue, they unfortunately suffer from model poisoning and low detection accuracy. In this chapter, we assess the robustness of image-based malware classifier against four different adversarial attacks: (a) random and benign brute-force byte append attacks for black-box settings and (b) random and benign Fast Gradient Sign Method (FGSM) attacks for white-box settings. To this end, we implement a Convolutional Neural Network (CNN) to classify the image representations of Windows Portable Executable (PE) malware with a detection accuracy of 95.05%. Then, we evaluate its robustness along with MalConv, a state-of-the-art malware classifier, by applying a set of functionality-preserving adversarial attacks. Our experimental results demonstrate that image-based classifier exhibits a lower evasion rate of 5% compared to MalConv that achieves an evasion rate ranging between 44 and 54% in black-box settings. However, in white-box settings, both models fail against random byte and benign byte FGSM attacks, with an evasion rate of more than 46%.
View details
Preview abstract
We extend conformal prediction to control the expected value of any monotone loss function. The
algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal
prediction, the conformal risk control procedure is tight up to an O(1/n) factor. Worked examples from
computer vision and natural language processing demonstrate the usage of our algorithm to bound the
false negative rate, graph distance, and token-level F1-score.
View details
Model-Free Preference Elicitation
Carlos Martin
Tuomas Sandholm
Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, South Korea (2024), pp. 3493-3503
Preview abstract
Elicitation of user preferences is becoming an important approach for improving the qualityof recommendations, especially when there is little or no user history. In this setting, arecommender system interacts with the user by iteratively presenting elicitation questionsand recording their responses. Various criteria have been proposed for optimizing thesequence of queries in order to improve user understanding and thereby the quality ofdownstream recommendations. A compelling approach for preference elicitation is theExpected Value of Information (EVOI), a Bayesian approach which computes the expectedgain in user utility for possible queries. Previous work on EVOI has focused on probabilisticmodels of users for computing posterior utilities. In contrast, in this work we exploremodel-free variants of EVOI which rely on function approximations in order to avoid strongmodeling assumptions. Specifically, we propose to learn a user response model and a userutility model from data which is often available in real-world systems, and to use thesemodels in EVOI in place of the probabilistic models. We show that our approach leads toimproved elicitation performance.
View details
The Inside Story of Google’s Quiet Nuclear R&D Quest
IEEE Spectrum (2024)
Preview abstract
Examines how a Google R&D programme sought to accelerate a future of safer, cheaper and more ubiquitous fusion and other nuclear energy. Discusses how the programme was started, its major components: fusion, edge-of-technology, and policy advocacy supporting innovation. Shows successful exits for each part. Beyond telling the sotry, an intents is to show how to move the needle, and get people to think about how they might also help, and show Google has made a difference. Timing of publication marks the 10th anniversary of programme's start.
View details