Yiwen Luo
Software Engineer specialized in frontend development. Currently focusing innovative and efficient ways of collecting ground truth for media data.
Previously worked on Inbox By Gmail and Gmail 2018 Redesign.
Authored Publications
Sort By
Rich Human Feedback for Text to Image Generation
Katherine Collins
Nicholas Carolan
Youwei Liang
Peizhao Li
Dj Dvijotham
Gang Li
Sarah Young
Jiao Sun
Arseniy Klimovskiy
Preview abstract
Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such as artifacts/implausibility, misalignment with text descriptions, and low aesthetic quality.
Inspired by the success of Reinforcement Learning with Human Feedback (RLHF) for large language models, prior work collected human-provided scores as feedback on generated images and trained a reward model to improve the T2I generation.
In this paper, we enrich the feedback signal by (i) marking image regions that are implausible or misaligned with the text, and (ii) annotating which keywords in the text prompt are not represented in the image.
We collect such rich human feedback on 18K generated images and train a multimodal transformer to predict these rich feedback automatically.
We show that the predicted rich human feedback can be leveraged to improve image generation, for example, by selecting high-quality training data to finetune and improve the generative models, or by creating masks with predicted heatmaps to inpaint the problematic regions.
Notably, the improvements generalize to models (Muse) beyond those used to generate the images on which human feedback data were collected (Stable Diffusion variants).
View details
Efficient video annotation with visual interpolation and frame selection guidance
Aakrati Talati
Keith Simmons
Vittorio Ferrari
WACV 2021, WACV 2021, pp. 14
Preview abstract
We introduce a unified framework for generic video annotation with bounding boxes. Video annotation is a long-standing problem, as it is a tedious and time-consuming process. We tackle two important challenges of video annotation: (1) automatic temporal interpolation and extrapolation of bounding boxes provided by a human annotator on a subset of all frames, and (2) automatic selection of frames
to annotate manually. Our contribution is two-fold: first, we propose a model that has both interpolating and extrapolating capabilities; second, we propose a guiding mechanism that sequentially generates suggestions for what frame to annotate next, based on the annotations made previously.
We extensively evaluate our approach on several challenging datasets in simulation and demonstrate a reduction in terms of the number of manual bounding boxes drawn by 60% over linear interpolation and by 35% over an off-the-shelf tracker. Moreover, we also show 10% annotation time improvement over a state-of-the-art method for video annotation with bounding boxes. Finally, we run human annotation experiments and provide extensive analysis of the results, showing that our approach reduces actual measured annotation time by 50% compared to commonly used linear interpolation.
View details