Wei Qiao
A software engineer.
Authored Publications
Sort By
Scaling Up LLM Reviews for Google Ads Content Moderation
Ariel Fuxman
Chih-Chun Chia
Dongjin Kwon
Enming Luo
Mehmet Tek
Ranjay Krishna
Tiantian Fang
Tushar Dogra
Yu-Han Lyu
(2024)
Preview abstract
Large language models (LLMs) are powerful tools for content moderation but LLM inference costs and latency on large volumes of data, such as the Google Ads repository, are prohibitive for their casual usage. This study is focused on scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. Then, LLMs are used to review only the representative ads. Finally we propagate the LLM decisions for representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a non-LLM model as a baseline. Note that, the success of this approach is a strong function of the representations used in clustering and label propagation; we observed that cross-modal similarity representations yield better results than uni-modal representations.
View details
Benchmarking Robustness to Adversarial Image Obfuscations
Florian Stimberg
Hussein Hazimeh
Yintao Liu
Merve Kaya
Ariel Fuxman
Mehmet Tek
Advances in Neural Information Processing Systems (2023)
Preview abstract
Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors obfuscate policy violating content to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors. It goes beyond ImageNet-C and ImageNet-C-Bar by proposing general, drastic, adversarial modifications that preserve the original content intent. It aims to tackle a more common adversarial threat than the one considered by Lp-norm bounded adversaries. Our hope is that this benchmark will encourage researchers to test their models and methods and try to find new approaches that are more robust to these obfuscations.
View details