Da-Cheng Juan
Da-Cheng Juan is a software engineer at Google Research. Da-Cheng has worked on large-scale, semi-supervised learning with Expander, as well as personalized recommendation for computational advertising. Prior to joining Google, Da-Cheng received his Ph.D. from Carnegie Mellon University in 2014. His research interests include machine learning, convex optimization, and data mining.
Authored Publications
Sort By
Preview abstract
Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections, which helps to specialize regions in weight matrices for different tasks. In order to construct the proposed hyper projection, our method learns the interactions and composition between a global state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, yielding optimistic and strong gains across GLUE and SuperGLUE benchmarks when trained in a single model multi-tasking setup. Our method helps to bridge the gap between the single-task finetune methods and the single model multi-tasking approaches
View details
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Vamsi Aribandi
ICML 2021
Preview abstract
This paper proposes Omnidirectional Representations from Transformers (\textsc{OmniNet}). In OmniNet, instead of maintaing a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network. This process can also be interpreted as a form of extreme or intensive attention mechanism that has the receptive field of the entire width and depth of the network. To this end, the omnidirection attention is learned via a meta-learner, which is essentially another self-attention based model. In order to mitigate the computationally expensive costs of full receptive field attention, we leverage efficient self-attention models such as kernel-based \cite{choromanski2020rethinking}, low-rank attention \cite{wang2020linformer} and/or Big Bird \cite{zaheer2020big} as the meta-learner. We conduct extensive experiments on autoregressive language modeling (LM1B, C4), Machine Translation, Long Range Arena (LRA) and Image Recognition, showing that OmniNet not only achieves considerable improvements when equipped with both sequence-based (1D) Transformers but also on image recognition (finetuning and few shot learning) tasks. OmniNet also achieves state-of-the-art performance on LM1B, WMT'14 En-De/En-Fr and Long Range Arena.
View details
Neural Structured Learning in TensorFlow: Hands-On Tutorial at KDD
Chun-Sung Ferng
George Yu
(2020), pp. 3501-3502
Preview abstract
We present Neural Structured Learning (NSL) in TensorFlow, a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph, or implicit, either induced by adversarial perturbation or inferred using techniques like embedding learning. NSL is open-sourced as part of the TensorFlow ecosystem and is widely used in Google across many products and services. In this tutorial, we provide an overview of the NSL framework including various libraries, tools, and APIs as well as demonstrate the practical use of NSL in different applications. The NSL website is hosted at www.tensorflow.org/neural_structured_learning, which includes details about the theoretical foundations of the technology, extensive API documentation, and hands-on tutorials.
View details
Graph-RISE: Graph-Regularized Image Semantic Embedding
Aleksei Timofeev
Futang Peng
Krishnamurthy Viswanathan
Lucy Gao
Sujith Ravi
Yi-ting Chen
Zhen Li
The 12th International Conference on Web Search and Data Mining (2020) (to appear)
Preview abstract
Learning image representation to capture instance-based semantics has been a challenging and important task for enabling many applications such as image search and clustering. In this paper, we explore the limits of image embedding learning at unprecedented scale and granularity. We present Graph-RISE, an image embedding that captures very fine-grained, instance-level semantics. Graph-RISE is learned via a large-scale, neural graph learning framework that leverages graph structure to regularize the training of deep neural networks. To the best of our knowledge, this is the first work that can capture instance-level image semantics at million—O(40M)—scale. Experimental results show that Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We also provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE well captures the semantics and differentiates nuances at instance level.
View details
Preview abstract
We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as Causal Sinkhorn Balancing and SortCut, a dynamic sequence truncation method for tailoring Sinkhorn Attention for encoding and/or decoding purposes. Via extensive experiments on algorithmic seq2seq sorting, language modeling, pixel-wise image generation, document classification and natural language inference, we demonstrate that our Sinkhorn Attention remains competitive to the vanilla attention, consistently outperforming recently proposed efficient Transformer models such as Sparse Transformers, while retaining memory efficiency.
View details
Improving Adversarial Robustness via Guided Complement Entropy
Hao-Yun Chen
Jhao-Hong Liang
Shih-Chieh Chang
Yu-Ting Chen
Wei Wei
International Conference on Computer Vision (ICCV) (2019)
Preview abstract
Adversarial robustness has emerged as an important topic in deep learning as carefully crafted attack samples can significantly disturb the performance of a model. Many recent methods have proposed to improve adversarial robustness by utilizing adversarial training or model distillation, which adds additional procedures to model training. In this paper, we propose a new training paradigm called Guided Complement Entropy (GCE) that is capable of achieving “adversarial defense for free,” which involves no additional procedures in the process of improving adversarial robustness. In addition to maximizing model probabilities on the ground-truth class like cross entropy, we neutralize its probabilities on the incorrect classes along with a “guided” term to balance between these two terms. We show in the experiments that our method achieves better model robustness with even better performance compared to the commonly used cross entropy training objective. We also show that our method can be used orthogonal to adversarial training across well known methods with noticeable robustness gain. To the best of our knowledge, our approach is the first one that improves model robustness without compromising performance.
View details
Complement Objective Training
Hao-Yun Chen
Pei-Hsin Wang
Chun-Hao Liu
Shih-Chieh Chang
Yu-Ting Chen
Wei Wei
International Conference on Learning Representations (ICLR) (2019)
Preview abstract
Learning with a primary objective, such as softmax cross entropy for classification and sequence generation, has been the norm for training deep neural networks for years. Although being a widely-adopted approach, using cross entropy as the primary objective exploits mostly the information from the ground-truth class for maximizing data likelihood, and largely ignores information from the complement (incorrect) classes. We argue that, in addition to the primary objective, training also using a complement objective that leverages information from the complement classes can be effective in improving model performance. This motivates us to study a new training paradigm that maximizes the likelihood of the ground-truth class while neutralizing the probabilities of the complement classes. We conduct extensive experiments on multiple tasks ranging from computer vision to natural language processing. The experimental results confirm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks. In addition to the accuracy improvement, we also show that models trained with both primary and complement objectives are more robust
to adversarial attacks.
View details
On the Robustness of Self-Attentive Models
Yu-Lun Hsieh
Minhao Cheng
Wei Wei
Wen-Lian Hsu
Cho-Jui Hsieh
Annual Meeting of the Association for Computational Linguistics (ACL) (2019)
Preview abstract
This work examines the robustness of self-attentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims.
View details
COCO-GAN: Generation by Parts via Conditional Coordinating
Chieh Hubert Lin
Chia-Che Chang
Yu-Sheng Chen
Wei Wei
Hwann-Tzong Chen
International Conference on Computer Vision (ICCV) (2019)
Preview abstract
We present a new architecture of generative adversarial nets (GANs): \underline{CO}nditional \underline{CO}ordinate GAN (\modelNamePunc). Given a latent vector and spatial positions, the generator learns to produce position-aware image patches; each patch is generated independently (referred as ``spatial disentanglement''), and without any post-processing, the produced patches can further be composed into a full image that is locally smooth and globally coherent. Without additional hyper-parameter tuning, the images composed by \modelName are qualitatively competitive with those generated by state-of-the-art GANs. In addition to the spatial disentanglement property, \modelName learns via coordinates, and can generalize to different predefined coordinate systems. We take panorama as a case study to demonstrate that, in addition to Cartesian coordinates, \modelName can also learn in a cylindrical coordinate system that is cyclic in the horizontal direction. We further investigate and demonstrate three new applications of \modelName. ``Patch-Inspired Image Generation'' takes an image patch and generates a full image containing a local patch similar to the given one. We show that the generated image can loosely retain some local structure or global characteristic of the original image. ``Partial-Scene Generation'' uses the controllable spatial disentanglement to render patches within the designated region without spending resources on generating pixels outside the region. ``Computational-Friendly Generation'' demonstrates multiple advantages of \modelName, including higher parallelism and lower memory requirement.
View details
DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures
Jin-Dong Dong
An-Chieh Cheng
Wei Wei
Min Sun
European Conference on Computer Vision (ECCV) (2018)
Preview abstract
Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performances in applications such as image classification and language modeling. However, these techniques typically ignore device-related objectives such as inference time, memory usage, and power consumption. Optimizing neural architecture for device-related objectives is immensely crucial for deploying deep networks on portable devices with limited computing resources. We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (eg, inference time and memory usage) and device-agnostic (eg, accuracy and model size) objectives. DPP-Net employs a compact search space inspired by current state-of-the-art mobile CNNs, and further improves search efficiency by adopting progressive search (Liu et al. 2017). Experimental results on CIFAR-10 are poised to demonstrate the effectiveness of Pareto-optimal networks found by DPP-Net, for three different devices:(1) a workstation with Titan X GPU,(2) NVIDIA Jetson TX1 embedded system, and (3) mobile phone with ARM Cortex-A53. Compared to CondenseNet and NASNet (Mobile), DPP-Net achieves better performances: higher accuracy and shorter inference time on various devices. Additional experimental results show that models found by DPP-Net also achieve considerably-good performance on ImageNet as well.
View details