Jinsung Yoon
I am a research scientist at Google Cloud AI. I am currently working on diverse machine learning research topics such as generative models, self- and semi-supervised learning, model interpretation, data imputation, and synthetic data generation.
Previously, I worked on machine learning for medicine with Professor Mihaela van der Schaar as a graduate student researcher in UCLA Electrical and Computer Engineering Department. I received my Ph.D. and M.S. in Electrical and Computer Engineering Department at UCLA, and B.S. in Electrical and Computer Engineering at Seoul National University (SNU).
https://scholar.google.com/citations?user=kiFd6A8AAAAJ&hl=en&oi=ao
Authored Publications
Sort By
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction
Somesh Jha
Transactions on Machine Learning Research (TMLR) (2024)
Preview abstract
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST→SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop.
View details
Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs
Somesh Jha
Findings of the Association for Computational Linguistics: EMNLP (2023)
Preview abstract
Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes
decision-making scenarios is still limited due to the potential for errors. Selective prediction
is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%.
View details
SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch
Chun-Liang Li
Kihyuk Sohn
Transactions on Machine Learning Research (TMLR) (2023)
Preview abstract
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications -- for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only `easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average.
View details
Preview abstract
We study anomaly clustering, grouping data into coherent clusters of anomaly types. This is different from anomaly detection that aims to divide anomalies from normal data.Unlike object-centered image clustering, anomaly clustering is particularly challenging as anomalous patterns are subtle and local. We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods. We define a distance function between images, each of which is represented as a bag of embeddings, by the Euclidean distance between weighted averaged embeddings. The weight defines the importance of instances (i.e., patch embeddings) in the bag, which may highlight defective regions. We compute weights in an unsupervised way or in a semi-supervised way when labeled normal data is available. Extensive experimental studies show the effectiveness of the proposed clustering framework along with a novel distance function upon existing multiple instance or deep clustering frameworks. Overall, our framework achieves 0.451 and 0.674 normalized mutual information scores on MVTec object and texture categories and further improve with a few labeled normal data(0.577, 0.669), far exceeding the baselines (0.244, 0.273)or state-of-the-art deep clustering methods (0.176, 0.277).
View details
Self-supervise, Refine, Repeat: Improving Unsupervised Anomaly Detection
Chun-Liang Li
Kihyuk Sohn
Transactions on Machine Learning Research (TMLR) (2022)
Preview abstract
Anomaly detection (AD), separating anomalies from normal data, has many applications across domains, from security to healthcare. While most previous works were shown to be effective for cases with fully or partially labeled data, that setting is in practice less common due to labeling being particularly tedious for this task. In this paper, we focus on fully unsupervised AD, in which the entire training dataset, containing both normal and anomalous samples, is unlabeled. To tackle this problem effectively, we propose to improve the robustness of one-class classification trained on self-supervised representations using a data refinement process. Our proposed data refinement approach is based on an ensemble of one-class classifiers (OCCs), each of which is trained on a disjoint subset of training data. Representations learned by self-supervised learning on the refined data are iteratively updated as the data refinement improves. We demonstrate our method on various unsupervised AD tasks with image and tabular data. With a 10% anomaly ratio on CIFAR-10 image data / 2.5% anomaly ratio on Thyroid tabular data, the proposed method outperforms the state-of-the-art one-class classifier by 6.3 AUC and 12.5 average precision / 22.9 F1-score.
View details
Algorithmic fairness in pandemic forecasting: lessons from COVID-19
Thomas Tsai
Benjamin Jacobson
Nate Yoder
Dario Sava
Meg Mitchell
Garth Graham
npj Digital Medicine (2022)
Preview abstract
Racial and ethnic minorities have borne a particularly acute burden of the COVID-19 pandemic in the United States. There is a growing awareness from both researchers and public health leaders of the critical need to ensure fairness in forecast results. Without careful and deliberate bias mitigation, inequities embedded in data can be transferred to model predictions, perpetuating disparities, and exacerbating the disproportionate harms of the COVID-19 pandemic. These biases in data and forecasts can be viewed through both statistical and sociological lenses, and the challenges of both building hierarchical models with limited data availability and drawing on data that reflects structural inequities must be confronted. We present an outline of key modeling domains in which unfairness may be introduced and draw on our experience building and testing the Google-Harvard COVID-19 Public Forecasting model to illustrate these challenges and offer strategies to address them. While targeted toward pandemic forecasting, these domains of potentially biased modeling and concurrent approaches to pursuing fairness present important considerations for equitable machine-learning innovation.
View details
Preview abstract
Understanding black-box machine learning models is crucial for their widespread adoption.
Learning globally interpretable models is one approach, but achieving high performance
with them is challenging. An alternative approach is to explain individual predictions
using locally interpretable models. For locally interpretable modeling, various methods
have been proposed and indeed commonly used, but they suffer from low fidelity, i.e. their
explanations do not approximate the predictions well. In this paper, our goal is to push the
state-of-the-art in high-fidelity locally interpretable modeling. We propose a novel framework,
Locally Interpretable Modeling using Instance-wise Subsampling (LIMIS). LIMIS utilizes a
policy gradient to select a small number of instances and distills the black-box model into a
low-capacity locally interpretable model using those selected instances. Training is guided
with a reward obtained directly by measuring the fidelity of the locally interpretable models.
We show on multiple tabular datasets that LIMIS near-matches the prediction accuracy of
black-box models, significantly outperforming state-of-the-art locally interpretable models in
terms of fidelity and prediction accuracy.
View details
Preview abstract
In this work, we aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data.
To this end, we propose a simple two-stage framework for building anomaly detectors using normal training data only, where we first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at random location of a large image.
Our empirical study on MVTec anomaly detection database demonstrates the proposed algorithm is general to detecting various types of real-world defects. We bring the
improvement upon previous arts by 3 AUCs when learning representations from scratch. By transfer learning representations from an ImageNet pretrained model, we achieve a new state-of-the-art 96.6 AUC.
Lastly, we extend the framework to learn and extract representations from patches to allow localization of defective areas without the need of annotation.
View details
A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan
Joel Shor
Arkady Epshteyn
Ashwin Sura Ravi
Beth Luan
Chun-Liang Li
Daisuke Yoneoka
Dario Sava
Hiroaki Miyata
Hiroki Kayama
Isaac Jones
Joe Mckenna
Johan Euphrosine
Kris Popendorf
Nate Yoder
Shashank Singh
Shuhei Nomura
Thomas Tsai
npj Digital Medicine (2021)
Preview abstract
The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We evaluate an AI-improved forecasting approach that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases and hospitalizations during the following 28 days. We present an international, prospective evaluation of model performance across all states and counties in the USA and prefectures in Japan. National mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths before and after prospective deployment remained consistently <3% (US) and <10% (Japan). Average statewide (US) and prefecture wide (Japan) MAPE was 6% and 20% respectively (14% when looking at prefectures with more than 10 deaths).We show our model performs well even during periods of considerable change in population behavior, and that it is robust to demographic differences across different geographic locations.We further demonstrate the model provides meaningful explanatory insights, finding that the model appropriately responds to local and national policy interventions. Our model enables counterfactual simulations, which indicate continuing NPIs alongside vaccinations is essential for more rapidly recovering from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions.
View details
Preview abstract
We propose a novel training method that integrates rules into deep learning, in a way the strengths of the rules are controllable at inference. Deep Neural Networks with Controllable Rule Representations (DeepCTRL) incorporates a rule encoder into the model coupled with a rule-based objective, enabling a shared representation for decision making. DeepCTRL is agnostic to data type and model architecture. It can be applied to any kind of rule defined for inputs and outputs. The key aspect of DeepCTRL is that it does not require retraining to adapt the rule strength -- at inference, the user can adjust it based on the desired operation point on accuracy vs. rule verification ratio. In real-world domains where incorporating rules is critical -- such as Physics, Retail and Healthcare -- we show the effectiveness of DeepCTRL in teaching rules for deep learning. DeepCTRL improves the trust and reliability of the trained models by significantly increasing their rule verification ratio, while also providing accuracy gains at downstream tasks. Additionally, DeepCTRL enables novel use cases such as hypothesis testing of the rules on data samples, and unsupervised adaptation based on shared rules between datasets.
View details