Chun-Ta Lu

Chun-Ta Lu

Chun-Ta Lu is a Software Engineer at Google Research. Prior to joining Google, he received his PhD from University of Illinois at Chicago. His main research interests span various fields of Computer Vision, Data Mining, Machine Learning.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Visual Program Tuning: Training Large Multimodal Models to Reason like Programs
    Yushi Hu
    Krishna Viswanathan
    Kenji Hata
    Enming Luo
    Ranjay Krishna
    Ariel Fuxman
    Conference on Computer Vision and Pattern Recognition(2024)
    Preview abstract Solving complex visual tasks (e.g., “Who invented the musical instrument on the right?”) involves back-and-forth between visual processing and reasoning. Visual programming is a recent multimodal framework that has shown promise in conducting visual reasoning in an interpretable and compositional manner. However, this framework is error-prone—it can lead to a wrong answer whenever the program itself is wrong, or when any of the steps of the program are solved incorrectly, thus leading to worse overall performance than end-to-end systems trained with labeled data. Moreover, it is inefficient to involve multiple steps (i.e., generating and then running programs) during inference. Ideally, a single large multimodal model (LMM) should directly conduct similar reasoning and yield the correct answer. In this work, we propose Visual Program Tuning (VPT), which leverages visual programs for teaching LLMs to reason via instruction tuning. VPT rewrites the execution traces of visual programs as chain-of-thought reasoning steps, and tunes an LMM to output not only the label but its reasoning as well. Extensive experiments on complex vision tasks show that models trained with VPT achieve state-of-the-art accuracy while being able to produce interpretable and faithful reasoning steps. PaLI-X + VPT outperforms all existing LMMs on a wide range of visual tasks, improving performance on counting, spatial relations, and compositional reasoning tasks. VPT is also helpful for quick adaptation on new tasks. Our experiments on content moderation show that fine-tuning LMMs with program-augmented examples is more sample efficient than traditional supervised training. View details
    Scaling Up LLM Reviews for Google Ads Content Moderation
    Ariel Fuxman
    Chih-Chun Chia
    Dongjin Kwon
    Enming Luo
    Mehmet Tek
    Ranjay Krishna
    Tiantian Fang
    Tushar Dogra
    Yu-Han Lyu
    (2024)
    Preview abstract Large language models (LLMs) are powerful tools for content moderation but LLM inference costs and latency on large volumes of data, such as the Google Ads repository, are prohibitive for their casual usage. This study is focused on scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. Then, LLMs are used to review only the representative ads. Finally we propagate the LLM decisions for representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a non-LLM model as a baseline. Note that, the success of this approach is a strong function of the representations used in clustering and label propagation; we observed that cross-modal similarity representations yield better results than uni-modal representations. View details
    Benchmarking Robustness to Adversarial Image Obfuscations
    Florian Stimberg
    Yintao Liu
    Merve Kaya
    Cyrus Rashtchian
    Ariel Fuxman
    Mehmet Tek
    Advances in Neural Information Processing Systems(2023)
    Preview abstract Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors obfuscate policy violating content to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors. It goes beyond ImageNet-C and ImageNet-C-Bar by proposing general, drastic, adversarial modifications that preserve the original content intent. It aims to tackle a more common adversarial threat than the one considered by Lp-norm bounded adversaries. Our hope is that this benchmark will encourage researchers to test their models and methods and try to find new approaches that are more robust to these obfuscations. View details
    Graph-RISE: Graph-Regularized Image Semantic Embedding
    Aleksei Timofeev
    Futang Peng
    Krishnamurthy Viswanathan
    Lucy Gao
    Sujith Ravi
    Yi-ting Chen
    Zhen Li
    The 12th International Conference on Web Search and Data Mining(2020) (to appear)
    Preview abstract Learning image representation to capture instance-based semantics has been a challenging and important task for enabling many applications such as image search and clustering. In this paper, we explore the limits of image embedding learning at unprecedented scale and granularity. We present Graph-RISE, an image embedding that captures very fine-grained, instance-level semantics. Graph-RISE is learned via a large-scale, neural graph learning framework that leverages graph structure to regularize the training of deep neural networks. To the best of our knowledge, this is the first work that can capture instance-level image semantics at million—O(40M)—scale. Experimental results show that Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We also provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE well captures the semantics and differentiates nuances at instance level. View details
    Preview abstract We present Neural Structured Learning (NSL) in TensorFlow, a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph, or implicit, either induced by adversarial perturbation or inferred using techniques like embedding learning. NSL is open-sourced as part of the TensorFlow ecosystem and is widely used in Google across many products and services. In this tutorial, we provide an overview of the NSL framework including various libraries, tools, and APIs as well as demonstrate the practical use of NSL in different applications. The NSL website is hosted at www.tensorflow.org/neural_structured_learning, which includes details about the theoretical foundations of the technology, extensive API documentation, and hands-on tutorials. View details
    Inferring Context from Pixels for Multimodal Image Classification
    Manan Shah
    Krishnamurthy Viswanathan
    Ariel Fuxman
    Zhen Li
    Aleksei Timofeev
    Chen Sun
    Proceedings of the 28th ACM International Conference on Information and Knowledge Management, ACM(2019) (to appear)
    Preview abstract Image classification models take image pixels as input and predict labels in a predefined taxonomy. While contextual information (e.g. text surrounding an image) can provide valuable orthogonal signals to improve classification, the typical setting in literature assumes the unavailability of text and thus focuses on models that rely purely on pixels. In this work, we also focus on the setting where only pixels are available in the input. However, we demonstrate that if we predict textual information from pixels, we can subsequently use the predicted text to train models that improve overall performance. We propose a framework that consists of two main components: (1) a phrase generator that maps image pixels to a contextual phrase, and (2) a multimodal model that uses textual features from the phrase generator and visual features from the image pixels to produce labels in the output taxonomy. The phrase generator is trained using web-based query-image pairs to incorporate contextual information associated with each image and has a large output space. We evaluate our framework on diverse benchmark datasets (specifically, the WebVision dataset for evaluating multi-class classification and OpenImages dataset for evaluating multi-label classification), demonstrating performance improvements over approaches based exclusively on pixels and showcasing benefits in prediction interpretability. We additionally present results to demonstrate that our framework provides improvements in few-shot learning of minimally labeled concepts. We further demonstrate the unique benefits of the multimodal nature of our framework by utilizing intermediate image/text co-embeddings to perform baseline zero-shot learning on the ImageNet dataset. View details
    Learning from Multi-View Multi-Way Data via Structural Factorization Machines
    Lifang He
    Hao Ding
    Bokai Cao
    Philip S. Yu
    Proceedings of the 2018 World Wide Web Conference, International World Wide Web Conferences Steering Committee, Lyon, France, pp. 1593-1602
    Preview abstract Real-world relations among entities can often be observed and determined by different perspectives/views. For example, the decision made by a user on whether to adopt an item relies on multiple aspects such as the contextual information of the decision, the item»s attributes, the user»s profile and the reviews given by other users. Different views may exhibit multi-way interactions among entities and provide complementary information. In this paper, we introduce a multi-tensor-based approach that can preserve the underlying structure of multi-view data in a generic predictive model. Specifically, we propose structural factorization machines (SFMs) that learn the common latent spaces shared by multi-view tensors and automatically adjust the importance of each view in the predictive model. Furthermore, the complexity of SFMs is linear in the number of parameters, which make SFMs suitable to large-scale problems. Extensive experiments on real-world datasets demonstrate that the proposed SFMs outperform several state-of-the-art methods in terms of prediction accuracy and computational cost. View details
    Spectral Collaborative Filtering
    Lei Zheng
    Fei Jiang
    Jiawei Zhang
    Philip S. Yu
    Proceedings of the 12th ACM Conference on Recommender Systems, ACM, Vancouver, British Columbia(2018), pp. 311-319
    Preview abstract Despite the popularity of Collaborative Filtering (CF), CF-based methods are haunted by the cold-start problem, which has a significantly negative impact on users' experiences with Recommender Systems (RS). In this paper, to overcome the aforementioned drawback, we first formulate the relationships between users and items as a bipartite graph. Then, we propose a new spectral convolution operation directly performing in the spectral domain, where not only the proximity information of a graph but also the connectivity information hidden in the graph are revealed. With the proposed spectral convolution operation, we build a deep recommendation model called Spectral Collaborative Filtering (SpectralCF). Benefiting from the rich information of connectivity existing in the spectral domain, SpectralCF is capable of discovering deep connections between users and items and therefore, alleviates the cold-start problem for CF. To the best of our knowledge, SpectralCF is the first CF-based method directly learning from the spectral domains of user-item bipartite graphs. We apply our method on several standard datasets. It is shown that SpectralCF significantly out-performs state-of-the-art models. Code and data are available at https://github.com/lzheng21/SpectralCF. View details
    Multilinear Factorization Machines for Multi-Task Multi-View Learning
    Lifang He
    Weixiang Shao
    Bokai Cao
    Philip S. Yu
    Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, ACM, Cambridge, United Kingdom(2017), pp. 701-709
    Preview abstract Many real-world problems, such as web image analysis, document categorization and product recommendation, often exhibit dual-heterogeneity: heterogeneous features obtained in multiple views, and multiple tasks might be related to each other through one or more shared views. To address these Multi-Task Multi-View (MTMV) problems, we propose a tensor-based framework for learning the predictive multilinear structure from the full-order feature interactions within the heterogeneous data. The usage of tensor structure is to strengthen and capture the complex relationships between multiple tasks with multiple views. We further develop efficient multilinear factorization machines (MFMs) that can learn the task-specific feature map and the task-view shared multilinear structures, without physically building the tensor. In the proposed method, a joint factorization is applied to the full-order interactions such that the consensus representation can be learned. In this manner, it can deal with the partially incomplete data without difficulty as the learning procedure does not simply rely on any particular view. Furthermore, the complexity of MFMs is linear in the number of parameters, which makes MFMs suitable to large-scale real-world problems. Extensive experiments on four real-world datasets demonstrate that the proposed method significantly outperforms several state-of-the-art methods in a wide variety of MTMV problems. View details
    Multi-way Multi-level Kernel Modeling for Neuroimaging Classification
    Lifang He
    Hao Ding
    Shen Wang
    Linlin Shen
    Philip S. Yu
    Ann B. Ragin
    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, USA(2017), pp. 6846-6854
    Preview abstract Owing to prominence as a diagnostic tool for probing the neural correlates of cognition, neuroimaging tensor data has been the focus of intense investigation. Although many supervised tensor learning approaches have been proposed, they either cannot capture the nonlinear relationships of tensor data or cannot preserve the complex multi-way structural information. In this paper, we propose a Multi-way Multi-level Kernel (MMK) model that can extract discriminative, nonlinear and structural preserving representations of tensor data. Specifically, we introduce a kernelized CP tensor factorization technique, which is equivalent to performing the low-rank tensor factorization in a possibly much higher dimensional space that is implicitly defined by the kernel function. We further employ a multi-way nonlinear feature mapping to derive the dual structural preserving kernels, which are used in conjunction with kernel machines (eg, SVM). Extensive experiments on real-world neuroimages demonstrate that the proposed MMK method can effectively boost the classification performance on diverse brain disorders (ie, Alzheimer's disease, ADHD, and HIV). View details
    Kernelized Support Tensor Machines
    Lifang He
    Guixiang Ma
    Shen Wang
    Linlin Shen
    Philip S. Yu
    Ann B. Ragin.
    Proceedings of the 34th International Conference on Machine Learning, PMLR, Sydney, Australia(2017), pp. 1442-1451
    Preview abstract In the context of supervised tensor learning, preserving the structural information and exploiting the discriminative nonlinear relationships of tensor data are crucial for improving the performance of learning tasks. Based on tensor factorization theory and kernel methods, we propose a novel Kernelized Support Tensor Machine (KSTM) which integrates kernelized tensor factorization with maximum-margin criterion. Specifically, the kernelized factorization technique is introduced to approximate the tensor data in kernel space such that the complex nonlinear relationships within tensor data can be explored. Further, dual structural preserving kernels are devised to learn the nonlinear boundary between tensor data. As a result of joint optimization, the kernels obtained in KSTM exhibit better generalization power to discriminative analysis. The experimental results on real-world neuroimaging datasets show the superiority of KSTM over the state-of-the-art techniques. View details
    Online Unsupervised Multi-view Feature Selection
    Weixiang Shao
    Lifang He
    Xiaokai Wei
    Philip S. Yu
    2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, Barcelona, Spain, pp. 1203-1208
    Preview abstract In this paper, we propose an Online unsupervised Multi-View Feature Selection method, OMVFS, which deals with large-scale/streaming multi-view data in an online fashion. OMVFS embeds unsupervised feature selection into a clustering algorithm via nonnegative matrix factorization with sparse learning. It further incorporates the graph regularization to preserve the local structure information and help select discriminative features. Instead of storing all the historical data, OMVFS processes the multi-view data chunk by chunk and aggregates all the necessary information into several small matrices. By using the buffering technique, the proposed OMVFS can reduce the computational and storage cost while taking advantage of the structure information. Furthermore, OMVFS can capture the concept drifts in the data streams. Extensive experiments on four real-world datasets show the effectiveness and efficiency of the proposed OMVFS method. More importantly, OMVFS is about 100 times faster than the off-line methods. View details
    Joint Community and Structural Hole Spanner Detection via Harmonic Modularity
    Lifang He
    Jiaqi Ma
    Jianping Cao
    Linlin Shen
    Philip S. Yu
    Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, California, USA(2016), pp. 875-884
    Preview abstract Detecting communities (or modular structures) and structural hole spanners, the nodes bridging different communities in a network, are two essential tasks in the realm of network analytics. Due to the topological nature of communities and structural hole spanners, these two tasks are naturally tangled with each other, while there has been little synergy between them. In this paper, we propose a novel harmonic modularity method to tackle both tasks simultaneously. Specifically, we apply a harmonic function to measure the smoothness of community structure and to obtain the community indicator. We then investigate the sparsity level of the interactions between communities, with particular emphasis on the nodes connecting to multiple communities, to discriminate the indicator of SH spanners and assist the community guidance. Extensive experiments on real-world networks demonstrate that our proposed method outperforms several state-of-the-art methods in the community detection task and also in the SH spanner identification task (even the methods that require the supervised community information). Furthermore, by removing the SH spanners spotted by our method, we show that the quality of other community detection methods can be further improved. View details
    Item Recommendation for Emerging Online Businesses
    Sihong Xie
    Weixiang Shao
    Lifang He
    Philip S. Yu
    Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, AAAI Press, New York, New York, USA(2016), pp. 3797-3803
    Preview abstract Nowadays, a large number of new online businesses emerge rapidly. For these emerging businesses, existing recommendation models usually suffer from the data-sparsity. In this paper, we introduce a novel similarity measure, AmpSim (Augmented Meta Path-based Similarity) that takes both the linkage structures and the augmented link attributes into account. By traversing between heterogeneous networks through overlapping entities, AmpSim can easily gather side information from other networks and capture the rich similarity semantics between entities. We further incorporate the similarity information captured by AmpSim in a collective matrix factorization model such that the transferred knowledge can be iteratively propagated across networks to fit the emerging business. Extensive experiments conducted on real-world datasets demonstrate that our method significantly outperforms other state-of-the-art recommendation models in addressing item recommendation for emerging businesses. View details
    Identifying your customers in social networks
    Hong-Han Shuai
    Philip S. Yu
    Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, ACM, Shanghai, China(2014), pp. 391-400
    Preview abstract Personal social networks are considered as one of the most influential sources in shaping a customer's attitudes and behaviors. However, the interactions with friends or colleagues in social networks of individual customers are barely observable in most e-commerce companies. In this paper, we study the problem of customer identification in social networks, i.e., connecting customer accounts at e-commerce sites to the corresponding user accounts in online social networks such as Twitter. Identifying customers in social networks is a crucial prerequisite for many potential marketing applications. These applications, for example, include personalized product recommendation based on social correlations, discovering community of customers, and maximizing product adoption and profits over social networks. We introduce a methodology CSI (Customer-Social Identification) for identifying customers in online social networks effectively by using the basic information of customers, such as username and purchase history. It consists of two key phases. The first phase constructs the features across networks that can be used to compare the similarity between pairs of accounts across networks with different schema (e.g. an e-commerce company and an online social network). The second phase identifies the top-K maximum similar and stable matched pairs of accounts across partially aligned networks. Extensive experiments on real-world datasets show that our CSI model consistently outperforms other commonly-used baselines on customer identification. View details
    Inferring the impacts of social media on crowdfunding
    Sihong Xie
    Xiangnan Kong
    Philip S Yu
    Proceedings of the 7th ACM International Conference on Web Search and Data Mining, ACM, New York, New York, USA(2014), pp. 573-582
    Preview abstract Crowdfunding -- in which people can raise funds through collaborative contributions of general public (i.e., crowd) -- has emerged as a billion dollars business for supporting more than one million ventures. However, very few research works have examined the process of crowdfunding. In particular, none has studied how social networks help crowdfunding projects to succeed. To gain insights into the effects of social networks in crowdfunding, we analyze the hidden connections between the fundraising results of projects on crowdfunding websites and the corresponding promotion campaigns in social media. Our analysis considers the dynamics of crowdfunding from two aspects: how fundraising activities and promotional activities on social media simultaneously evolve over time, and how the promotion campaigns influence the final outcomes. From our investigation, we identify a number of important principles that provide a useful guide for devising effective campaigns. For example, we observe temporal distribution of customer interest, strong correlations between a crowdfunding project's early promotional activities and the final outcomes, and the importance of concurrent promotion from multiple sources. We then show that these discoveries can help predict several important quantities, including overall popularity and the success rate of the project. Finally, we show how to use these discoveries to help design crowdfunding sites. View details