Jump to Content
Da-Cheng Juan

Da-Cheng Juan

Da-Cheng Juan is a software engineer at Google Research. Da-Cheng has worked on large-scale, semi-supervised learning with Expander, as well as personalized recommendation for computational advertising. Prior to joining Google, Da-Cheng received his Ph.D. from Carnegie Mellon University in 2014. His research interests include machine learning, convex optimization, and data mining.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract This paper proposes Omnidirectional Representations from Transformers (\textsc{OmniNet}). In OmniNet, instead of maintaing a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network. This process can also be interpreted as a form of extreme or intensive attention mechanism that has the receptive field of the entire width and depth of the network. To this end, the omnidirection attention is learned via a meta-learner, which is essentially another self-attention based model. In order to mitigate the computationally expensive costs of full receptive field attention, we leverage efficient self-attention models such as kernel-based \cite{choromanski2020rethinking}, low-rank attention \cite{wang2020linformer} and/or Big Bird \cite{zaheer2020big} as the meta-learner. We conduct extensive experiments on autoregressive language modeling (LM1B, C4), Machine Translation, Long Range Arena (LRA) and Image Recognition, showing that OmniNet not only achieves considerable improvements when equipped with both sequence-based (1D) Transformers but also on image recognition (finetuning and few shot learning) tasks. OmniNet also achieves state-of-the-art performance on LM1B, WMT'14 En-De/En-Fr and Long Range Arena. View details
    Preview abstract Achieving state-of-the-art performance on natural language understanding tasks typically relies on fine-tuning a fresh model for every task. Consequently, this approach leads to a higher overall parameter cost, along with higher technical maintenance for serving multiple models. Learning a single multi-task model that is able to do well for all the tasks has been a challenging and yet attractive proposition. In this paper, we propose \textsc{HyperGrid}, a new approach for highly effective multi-task learning. The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections, which helps to specialize regions in weight matrices for different tasks. In order to construct the proposed hyper projection, our method learns the interactions and composition between a global state and a local task-specific state. We apply our proposed \textsc{HyperGrid} on the current state-of-the-art T5 model, yielding optimistic and strong gains across GLUE and SuperGLUE benchmarks when trained in a single model multi-tasking setup. Our method helps to bridge the gap between the single-task finetune methods and the single model multi-tasking approaches View details
    Preview abstract We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to generate latent permutations over sequences. Given sorted sequences, we are then able to compute quasi-global attention with only local windows, improving the memory efficiency of the attention module. To this end, we propose new algorithmic innovations such as Causal Sinkhorn Balancing and SortCut, a dynamic sequence truncation method for tailoring Sinkhorn Attention for encoding and/or decoding purposes. Via extensive experiments on algorithmic seq2seq sorting, language modeling, pixel-wise image generation, document classification and natural language inference, we demonstrate that our Sinkhorn Attention remains competitive to the vanilla attention, consistently outperforming recently proposed efficient Transformer models such as Sparse Transformers, while retaining memory efficiency. View details
    Graph-RISE: Graph-Regularized Image Semantic Embedding
    Aleksei Timofeev
    Futang Peng
    Krishnamurthy Viswanathan
    Lucy Gao
    Sujith Ravi
    Yi-ting Chen
    Zhen Li
    The 12th International Conference on Web Search and Data Mining (2020) (to appear)
    Preview abstract Learning image representation to capture instance-based semantics has been a challenging and important task for enabling many applications such as image search and clustering. In this paper, we explore the limits of image embedding learning at unprecedented scale and granularity. We present Graph-RISE, an image embedding that captures very fine-grained, instance-level semantics. Graph-RISE is learned via a large-scale, neural graph learning framework that leverages graph structure to regularize the training of deep neural networks. To the best of our knowledge, this is the first work that can capture instance-level image semantics at million—O(40M)—scale. Experimental results show that Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We also provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE well captures the semantics and differentiates nuances at instance level. View details
    Preview abstract We present Neural Structured Learning (NSL) in TensorFlow, a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph, or implicit, either induced by adversarial perturbation or inferred using techniques like embedding learning. NSL is open-sourced as part of the TensorFlow ecosystem and is widely used in Google across many products and services. In this tutorial, we provide an overview of the NSL framework including various libraries, tools, and APIs as well as demonstrate the practical use of NSL in different applications. The NSL website is hosted at www.tensorflow.org/neural_structured_learning, which includes details about the theoretical foundations of the technology, extensive API documentation, and hands-on tutorials. View details
    Improving Adversarial Robustness via Guided Complement Entropy
    Hao-Yun Chen
    Jhao-Hong Liang
    Shih-Chieh Chang
    Yu-Ting Chen
    Wei Wei
    International Conference on Computer Vision (ICCV) (2019)
    Preview abstract Adversarial robustness has emerged as an important topic in deep learning as carefully crafted attack samples can significantly disturb the performance of a model. Many recent methods have proposed to improve adversarial robustness by utilizing adversarial training or model distillation, which adds additional procedures to model training. In this paper, we propose a new training paradigm called Guided Complement Entropy (GCE) that is capable of achieving “adversarial defense for free,” which involves no additional procedures in the process of improving adversarial robustness. In addition to maximizing model probabilities on the ground-truth class like cross entropy, we neutralize its probabilities on the incorrect classes along with a “guided” term to balance between these two terms. We show in the experiments that our method achieves better model robustness with even better performance compared to the commonly used cross entropy training objective. We also show that our method can be used orthogonal to adversarial training across well known methods with noticeable robustness gain. To the best of our knowledge, our approach is the first one that improves model robustness without compromising performance. View details
    Complement Objective Training
    Hao-Yun Chen
    Pei-Hsin Wang
    Chun-Hao Liu
    Shih-Chieh Chang
    Yu-Ting Chen
    Wei Wei
    International Conference on Learning Representations (ICLR) (2019)
    Preview abstract Learning with a primary objective, such as softmax cross entropy for classification and sequence generation, has been the norm for training deep neural networks for years. Although being a widely-adopted approach, using cross entropy as the primary objective exploits mostly the information from the ground-truth class for maximizing data likelihood, and largely ignores information from the complement (incorrect) classes. We argue that, in addition to the primary objective, training also using a complement objective that leverages information from the complement classes can be effective in improving model performance. This motivates us to study a new training paradigm that maximizes the likelihood of the ground-truth class while neutralizing the probabilities of the complement classes. We conduct extensive experiments on multiple tasks ranging from computer vision to natural language processing. The experimental results confirm that, compared to the conventional training with just one primary objective, training also with the complement objective further improves the performance of the state-of-the-art models across all tasks. In addition to the accuracy improvement, we also show that models trained with both primary and complement objectives are more robust to adversarial attacks. View details
    COCO-GAN: Generation by Parts via Conditional Coordinating
    Chieh Hubert Lin
    Chia-Che Chang
    Yu-Sheng Chen
    Wei Wei
    Hwann-Tzong Chen
    International Conference on Computer Vision (ICCV) (2019)
    Preview abstract We present a new architecture of generative adversarial nets (GANs): \underline{CO}nditional \underline{CO}ordinate GAN (\modelNamePunc). Given a latent vector and spatial positions, the generator learns to produce position-aware image patches; each patch is generated independently (referred as ``spatial disentanglement''), and without any post-processing, the produced patches can further be composed into a full image that is locally smooth and globally coherent. Without additional hyper-parameter tuning, the images composed by \modelName are qualitatively competitive with those generated by state-of-the-art GANs. In addition to the spatial disentanglement property, \modelName learns via coordinates, and can generalize to different predefined coordinate systems. We take panorama as a case study to demonstrate that, in addition to Cartesian coordinates, \modelName can also learn in a cylindrical coordinate system that is cyclic in the horizontal direction. We further investigate and demonstrate three new applications of \modelName. ``Patch-Inspired Image Generation'' takes an image patch and generates a full image containing a local patch similar to the given one. We show that the generated image can loosely retain some local structure or global characteristic of the original image. ``Partial-Scene Generation'' uses the controllable spatial disentanglement to render patches within the designated region without spending resources on generating pixels outside the region. ``Computational-Friendly Generation'' demonstrates multiple advantages of \modelName, including higher parallelism and lower memory requirement. View details
    On the Robustness of Self-Attentive Models
    Yu-Lun Hsieh
    Minhao Cheng
    Wei Wei
    Wen-Lian Hsu
    Cho-Jui Hsieh
    Annual Meeting of the Association for Computational Linguistics (ACL) (2019)
    Preview abstract This work examines the robustness of self-attentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims. View details
    MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning
    Chi-Hung Hsu
    Shih-Chieh Chang
    Jhao-Hong Liang
    Hsin-Ping Chou
    Chun-Hao Liu
    Shu-Huan Chang
    Yu-Ting Chen
    Wei Wei
    (2018)
    Preview abstract Recent studies on neural architecture search have shown that automatically designed neural networks perform as good as expert-crafted architectures. While most existing works aim at finding architectures that optimize the prediction accuracy, these architectures may have complexity and is therefore not suitable being deployed on certain computing environment (e.g., with limited power budgets). We propose MONAS, a framework for Multi-Objective Neural Architectural Search that employs reward functions considering both prediction accuracy and other important objectives (e.g., power consumption) when searching for neural network architectures. Experimental results showed that, compared to the state-of-the-arts, models found by MONAS achieve comparable or better classification accuracy on computer vision applications, while satisfying the additional objectives such as peak power. View details
    PPP-Net: Platform-aware Progressive Search for Pareto-optimal Neural Architectures
    Jin-Dong Dong
    An-Chieh Cheng
    Wei Wei
    Min Sun
    International Conference on Learning Representations (ICLR) Workshop (2018)
    Preview abstract Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performances in many applications such as image recognition. However, these techniques typically ignore platform-related constrictions (e.g., inference time and power consumptions) that can be critical for portable devices with limited computing resources. We propose PPP-Net: a multi-objective architectural search framework to automatically generate networks that achieve Pareto Optimality. PPP-Net employs a compact search space inspired by operations used in state-of-the-art mobile CNNs. PPP-Net has also adopted the progressive search strategy used in a recent literature (Liu et al. (2017a)). Experimental results demonstrate that PPP-Net achieves better performances in both (a) higher accuracy and (b) shorter inference time, comparing to the state-of-the-art CondenseNet. View details
    DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures
    Jin-Dong Dong
    An-Chieh Cheng
    Wei Wei
    Min Sun
    European Conference on Computer Vision (ECCV) (2018)
    Preview abstract Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performances in applications such as image classification and language modeling. However, these techniques typically ignore device-related objectives such as inference time, memory usage, and power consumption. Optimizing neural architecture for device-related objectives is immensely crucial for deploying deep networks on portable devices with limited computing resources. We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (eg, inference time and memory usage) and device-agnostic (eg, accuracy and model size) objectives. DPP-Net employs a compact search space inspired by current state-of-the-art mobile CNNs, and further improves search efficiency by adopting progressive search (Liu et al. 2017). Experimental results on CIFAR-10 are poised to demonstrate the effectiveness of Pareto-optimal networks found by DPP-Net, for three different devices:(1) a workstation with Titan X GPU,(2) NVIDIA Jetson TX1 embedded system, and (3) mobile phone with ARM Cortex-A53. Compared to CondenseNet and NASNet (Mobile), DPP-Net achieves better performances: higher accuracy and shorter inference time on various devices. Additional experimental results show that models found by DPP-Net also achieve considerably-good performance on ImageNet as well. View details
    Escaping from Collapsing Modes in a Constrained Space
    Chia-Che Chang
    Chieh Hubert Lin
    Che-Rung Lee
    Wei Wei
    Hwann-Tzong Chen
    European Conference on Computer Vision (ECCV) (2018)
    Preview abstract Generative adversarial networks (GANs) often suffer from unpredictable mode-collapsing during training. We study the issue of mode collapse of Boundary Equilibrium Generative Adversarial Network (BEGAN), which is one of the state-of-the-art generative models. Despite its potential of generating high-quality images, we find that BEGAN tends to collapse at some modes after a period of training. We propose a new model, called BEGAN with a Constrained Space (BEGAN-CS), which includes a latent-space constraint in the loss function. We show that BEGAN-CS can significantly improve training stability and suppress mode collapse without either increasing the model complexity or degrading the image quality. Further, we visualize the distribution of latent vectors to elucidate the effect of latent-space constraint. The experimental results show that our method has additional advantages of being able to train on small datasets and to generate images similar to a given real image but with variations of designated attributes on-the-fly. View details
    M3A: Model, MetaModel, and Anomaly Detection in Web Searches
    Neil Shah
    Mingyu Tang
    Zhiliang Qian
    Diana Marculescu
    Christos Faloutsos
    arXiv preprint arXiv:1606.05978 (2016)
    Preview abstract ‘Alice’ is submitting one web search per five minutes, for three hours in a row−is it normal? How to detect abnormal search behaviors, among Alice and other users? Is there any distinct pattern in Alice’s (or other users’) search behavior? We studied what is probably the largest, publicly available, query log, containing more than 30 million queries from 0.6 million users. In this paper, we present a novel, user-and group-level framework, M3A: Model, MetaModel and Anomaly detection. For each user, we discover and explain a surprising, bi-modal pattern of the inter-arrival time (IAT) of landed queries (queries with user click-through). Specifically, the model Camel-Log is proposed to describe such an IAT distribution; we then notice the correlations among its parameters at the group level. Thus, we further propose the metamodel Meta-Click, to capture and explain the two-dimensional, heavy-tail distribution of the parameters. Combining Camel-Log and Meta-Click, the proposed M3A has the following strong points: (1) the accurate modeling of marginal IAT distribution, (2) quantitative interpretations, and (3) anomaly detection. View details
    Beyond poisson: Modeling inter-arrival time of requests in a datacenter
    Lei Li
    Huan-Kai Peng
    Diana Marculescu
    Christos Faloutsos
    Springer (2014), pp. 198-209
    A learning-based autoregressive model for fast transient thermal analysis of chip-multiprocessors
    Huapeng Zhou
    Diana Marculescu
    Xin Li
    IEEE (2012), pp. 597-602
    Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors
    Diana Marculescu
    ACM, pp. 97-102