Jump to Content
Otilia Stretcu

Otilia Stretcu

I am a Research Scientist at Google AI in Mountain View, California, working on machine learning research. Previously, I was a PhD student in the Machine Learning Department at Carnegie Mellon University, co-advised by Tom Mitchell and Barnabàs Pòczos. My PhD research focused on developing algorithms for machine learning, mainly focused on semi-supervised learning, curriculum learning, multitask learning, and graph-based problems. I am also passionate about applying machine learning methods in neuroscience, in order to study how the brain understands language and controls speech. Previously, I did some research in Computer Vision, with the goal of detecting and tracking objects in videos.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Visual Program Tuning: Training Large Multimodal Models to Reason like Programs
    Yushi Hu
    Krishna Viswanathan
    Enming Luo
    Ranjay Krishna
    Ariel Fuxman
    Conference on Computer Vision and Pattern Recognition (2024)
    Preview abstract Solving complex visual tasks (e.g., “Who invented the musical instrument on the right?”) involves back-and-forth between visual processing and reasoning. Visual programming is a recent multimodal framework that has shown promise in conducting visual reasoning in an interpretable and compositional manner. However, this framework is error-prone—it can lead to a wrong answer whenever the program itself is wrong, or when any of the steps of the program are solved incorrectly, thus leading to worse overall performance than end-to-end systems trained with labeled data. Moreover, it is inefficient to involve multiple steps (i.e., generating and then running programs) during inference. Ideally, a single large multimodal model (LMM) should directly conduct similar reasoning and yield the correct answer. In this work, we propose Visual Program Tuning (VPT), which leverages visual programs for teaching LLMs to reason via instruction tuning. VPT rewrites the execution traces of visual programs as chain-of-thought reasoning steps, and tunes an LMM to output not only the label but its reasoning as well. Extensive experiments on complex vision tasks show that models trained with VPT achieve state-of-the-art accuracy while being able to produce interpretable and faithful reasoning steps. PaLI-X + VPT outperforms all existing LMMs on a wide range of visual tasks, improving performance on counting, spatial relations, and compositional reasoning tasks. VPT is also helpful for quick adaptation on new tasks. Our experiments on content moderation show that fine-tuning LMMs with program-augmented examples is more sample efficient than traditional supervised training. View details
    Scaling Up LLM Reviews for Google Ads Content Moderation
    Ariel Fuxman
    Chih-Chun Chia
    Dongjin Kwon
    Enming Luo
    Mehmet Tek
    Ranjay Krishna
    Tiantian Fang
    Tushar Dogra
    Yu-Han Lyu
    Preview abstract Large language models (LLMs) are powerful tools for content moderation but LLM inference costs and latency on large volumes of data, such as the Google Ads repository, are prohibitive for their casual usage. This study is focused on scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. Then, LLMs are used to review only the representative ads. Finally we propagate the LLM decisions for representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a non-LLM model as a baseline. Note that, the success of this approach is a strong function of the representations used in clustering and label propagation; we observed that cross-modal similarity representations yield better results than uni-modal representations. View details
    Benchmarking Robustness to Adversarial Image Obfuscations
    Florian Stimberg
    Yintao Liu
    Merve Kaya
    Cyrus Rashtchian
    Ariel Fuxman
    Mehmet Tek
    Advances in Neural Information Processing Systems (2023)
    Preview abstract Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors obfuscate policy violating content to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors. It goes beyond ImageNet-C and ImageNet-C-Bar by proposing general, drastic, adversarial modifications that preserve the original content intent. It aims to tackle a more common adversarial threat than the one considered by Lp-norm bounded adversaries. Our hope is that this benchmark will encourage researchers to test their models and methods and try to find new approaches that are more robust to these obfuscations. View details
    Graph Agreement Models for Semi-supervised Learning
    Krishnamurthy Viswanathan
    Anthony Platanios
    Sujith Ravi
    Proceedings of the Thirty-third Conference on Neural Information Processing Systems, Neurips 2019
    Preview abstract Graph-based algorithms are among the most successful paradigms for solving semi-supervised learning tasks. Recent work on graph convolutional networks and neural graph learning methods has successfully combined the expressiveness of neural networks with graph structures. We propose a technique that, when applied to these methods, achieves state-of-the-art results on semi-supervised learning datasets. Traditional graph-based algorithms, such as label propagation, were designed with the underlying assumption that the label of a node can be imputed from that of the neighboring nodes. However, real-world graphs are either noisy or have edges that do not correspond to label agreement. To address this, we propose Graph Agreement Models (GAM), which introduces an auxiliary model that predicts the probability of two nodes sharing the same label as a learned function of their features. The agreement model is used when training a node classification model by encouraging agreement only for the pairs of nodes it deems likely to have the same label, thus guiding its parameters to better local optima. The classification and agreement models are trained jointly in a co-training fashion. Moreover, GAM can also be applied to any semi-supervised classification problem, by inducing a graph whenever one is not provided. We demonstrate that our method achieves a relative improvement of up to 72% for various node classification models, and obtains state-of-the-art results on multiple established datasets. View details
    No Results Found