Jump to Content

Ekin Dogus Cubuk

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract Human-like perceptual similarity is an emergent property in the intermediate feature space of ImageNet-pretrained classifiers. Perceptual distances between images, as measured in the space of pre-trained image embeddings, have outperformed prior low-level metrics significantly on assessing image similarity. This has led to the wide adoption of perceptual distances as both an evaluation metric and an auxiliary training objective for image synthesis tasks. While image classification has improved by leaps and bounds, the de facto standard for computing perceptual distances uses older, less accurate models such as VGG and AlexNet. Motivated by this, we evaluate the perceptual scores of modern networks: ResNets, EfficientNets and VisionTransformers. Surprisingly, we observe an inverse correlation between ImageNet accuracy and perceptual scores: better classifiers achieve worse perceptual scores. We dive deeper into this, studying the ImageNet accuracy/perceptual score relationship under different hyperparameter configurations. Improving accuracy improves perceptual scores up to a certain point, but beyond this point we uncover a Pareto frontier between accuracies and perceptual scores. We explore this relationship further using distortion invariance, spatial frequency sensitivity and alternative perceptual functions. Based on our study, we find a ImageNet trained ResNet-6 network whose emergent perceptual score matches the best prior score obtained with networks trained explicitly on a perceptual similarity task. View details
    Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics
    Li Li
    Ryan Pederson
    Patrick Francis Riley
    Kieron Burke
    Phys. Rev. Lett., vol. 126 (2021), pp. 036401
    Preview abstract Including prior knowledge is important for effective machine learning models in physics and is usually achieved by explicitly adding loss terms or constraints on model architectures. Prior knowledge embedded in the physics computation itself rarely draws attention. We show that solving the Kohn-Sham equations when training neural networks for the exchange-correlation functional provides an implicit regularization that greatly improves generalization. Two separations suffice for learning the entire one-dimensional H$_2$ dissociation curve within chemical accuracy, including the strongly correlated region. Our models also generalize to unseen types of molecules and overcome self-interaction error. View details
    Revisiting ResNets: Improved Training Methodologies and Scaling Principles
    Irwan Bello
    Liam B. Fedus
    Xianzhi Du
    Aravind Srinivas
    Tsung-Yi Lin
    Jon Shlens
    Barret Richard Zoph
    ICML 2021 (2021) (to appear)
    Preview abstract Novel ImageNet architectures monopolize the limelight when advancing the state-of-the-art, but progress is often muddled by simultaneous changes to training methodology and scaling strategies. Our work disentangles these factors by revisiting the ResNet architecture using modern training and scaling techniques and, in doing so, we show ResNets match recent state-of-the-art models. A ResNet trained to 79.0 top-1 ImageNet accuracy is increased to 82.2 through improved training methodology alone; two small popular architecture changes further improve this to 83.4. We next offer new perspectives on the scaling strategy which we summarize by two key principles: (1) increase model depth and image size, but not model width (2) increase image size far more slowly than previously recommended. Using improved training methodology and our scaling principles, we design a family of ResNet architectures, ResNet-RS, which are 1.9x - 2.3x faster than the EfficientNets in supervised learning on ImageNet. And though EfficientNet has significantly fewer FLOPs and parameters -- training ResNet-RS is both faster and less memory-intensive, serving as a strong baseline for researchers and practitioners. View details
    Preview abstract Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of distribution shift or augmentation diversity. Inspired by these, we conduct an empirical study to quantify how data augmentation improves model generalization. We introduce two interpretable and easy-to-compute measures: Affinity and Diversity. We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two. View details
    Improving 3D Object Detection through Progressive Population Based Augmentation
    Shuyang Cheng
    Zhaoqi Leng
    Barret Richard Zoph
    Chunyan Bai
    Jiquan Ngiam
    Vijay Vasudevan
    Jon Shlens
    Drago Anguelov
    ECCV'2020
    Preview abstract Data augmentation has been widely adopted for object detection in 3-D point clouds. All efforts have focused on manually designing specific data augmentation methods for individual architectures, however no work has attempted to automate the design of data augmentation in 3-D detection problems -- as is common in 2-D camera-based computer vision. In this work, we present a first attempt to automate the design of data augmentation policies for 3-D object detection. We describe an algorithm termed Progressive Population Based Augmentation (PPBA). PPBA learns to optimize augmentation strategies by narrowing down the search space, and adopting the best parameters discovered in previous iterations. On the KITTI test set, PPBA improves the StarNet by substantial margins on the moderate difficulty category of cars, pedestrians, and cyclists, outperforming all current state-of-the-art single-stage detection models. Additional experiments on the Waymo Open Dataset, a 20x larger dataset compared to KITTI, indicate that PPBA continues to effectively improve 3D object detection. The magnitude of the improvements may be comparable to advances in 3-D perception architectures, yet data augmentation incurs no cost at inference time. In subsequent experiments, we find that PPBA may be up to 10x more data efficient on baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples. View details
    Preview abstract We improve the recently-proposed ``MixMatch'' semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of groundtruth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between 5x and 16x less data to reach the same accuracy. For example, on CIFAR10 with 250 labeled examples we reach 93.73% accuracy (compared to MixMatch’s accuracy of 93.58% with 4,000 examples) and a median accuracy of 84.92% with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch. View details
    Preview abstract Materials design enables technologies critical to humanity, including combating climate change with solar cells and batteries. Many properties of a material are determined by its atomic crystal structure. However, prediction of the atomic crystal structure for a given material's chemical formula is a long-standing grand challenge that remains a barrier in materials design. We investigate a data-driven approach to accelerating ab initio random structure search (AIRSS), a state-of-the-art method for crystal structure search. We build a novel dataset of random structure relaxations of Li-Si battery anode materials using high-throughput density functional theory calculations. We train graph neural networks to simulate relaxations of random structures. Our model is able to find an experimentally verified structure of Li15Si4 it was not trained on, and has potential for orders of magnitude speedup over AIRSS when searching large unit cells and searching over multiple chemical stoichiometries. Surprisingly, we find that data augmentation of adding Gaussian noise improves both the accuracy and out of domain generalization of our models. View details
    Preview abstract Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model’s performance. This domain has seen fast progress recently, at the cost of requiring more complex methods. In this paper we proposeFixMatch, an algorithm that is a significant simplification of existing SSL methods.FixMatch first generates pseudo-labels using the model’s predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 – just 4 labels per class. We carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch’s success View details
    Naive-Student: Leveraging semi-supervised learning in video sequences for urban scene segmentation
    Liang-Chieh Chen
    Rapha Gontijo Lopes
    Bowen Cheng
    Maxwell D. Collins
    Barret Richard Zoph
    Jon Shlens
    European Conference on Computer Vision (ECCV) (2020)
    Preview abstract Supervised learning in large discriminative models is a mainstay for modern computer vision. Such an approach necessitates investing in large-scale, human annotated datasets for achieving state-of-the-art results. In turn, the efficacy of supervised learning may be limited by the size of the human annotated dataset. This limitation is particularly notable for image segmentation tasks where the expense of human annotation may be especially large, yet large amounts of unlabeled data may exist. In this work, we ask if we may leverage unlabeled video sequences to improve the performance on urban scene segmentation using semi-supervised learning. The goal of this work is to avoid the construction of sophisticated, learned architectures specific to label propagation (e.g., patch matching and optical flow). Instead, we simply predict pseudo-labels for the unlabeled data and train subsequent models with a mix of human-annotated and pseudo-labeled data. The procedure is iterated for several times. As a result, our model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks, reaching the performance of 67.6% PQ, 42.4% AP, and 85.1% mIOU on the test set. We view this work as a notable step for building a simple procedure to harness unlabeled video sequences to surpass state-of-the-art performance on core computer vision tasks. View details
    Preview abstract We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filterbanks). The augmentation policy consists of warping the features, masking blocks of frequencies, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the Librispeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with language model rescoring. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/15.4% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER. View details
    Preview abstract In this paper, we take a closer look at data augmentation for images, and describe a simple procedure called AutoAugment to search for improved data augmentation policies. Our key insight is to create a search space of data augmentation policies, evaluating the quality of a particular policy directly on the dataset of interest. In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.54%. On CIFAR-10, we achieve an error rate of 1.48%, which is 0.65% better than the previous state-of-the-art. Finally, policies learned from one dataset can be transferred to work well on other similar datasets. For example, the policy learned on ImageNet allows us to achieve state-of-the-art accuracy on the fine grained visual classification dataset Stanford Cars, without fine-tuning weights pre-trained on additional data. Code to train Wide-ResNet, Shake-Shake and ShakeDrop models with AutoAugment policies can be found at https://github.com/tensorflow/models/tree/master/research/autoaugment View details
    A Fourier Perspective on Model Robustness in Computer Vision
    Dong Yin
    Rapha Gontijo Lopes
    Jon Shlens
    Justin Gilmer
    NeurIPS (2019)
    Preview abstract Achieving robustness to distributional shift is a longstanding and challenging goal of computer vision. Data augmentation is a commonly used approach for improving robustness, however robustness gains are typically not uniform across corruption types. Indeed increasing performance in the presence of random noise is often met with reduced performance on other corruptions such as contrast change. Understanding when and why these sorts of trade-offs occur is a crucial step towards mitigating them. Towards this end, we investigate recently observed tradeoffs caused by Gaussian data augmentation and adversarial training. We find that both methods improve robustness to corruptions that are concentrated in the high frequency domain while reducing robustness to corruptions that are concentrated in the low frequency domain. This suggests that one way to mitigate these trade-offs via data augmentation is to use a more diverse set of augmentations. Towards this end we observe that AutoAugment [5], a recently proposed data augmentation policy optimized for clean accuracy, achieves state-of-the-art robustness on the CIFAR-10-C and ImageNet-C benchmarks. View details
    Accelerated search and design for stretchable graphene kirigami using machine learning
    Paul Z Hanakata
    David K. Campbell
    Harold S. Park
    Physical Review Letters, vol. 121 (2018), pp. 255304
    Preview abstract Making kirigami-inspired cuts into a sheet has been shown to be an effective way of designing stretchable materials with metamorphic properties where the 2D shape can transform into complex 3D shapes. However, finding the optimal solutions is not straightforward as the number of possible cutting patterns grows exponentially with system size. Here, we report on how machine learning (ML) can be used to approximate the target properties, such as yield stress and yield strain, as a function of cutting pattern. Our approach enables the rapid discovery of kirigami designs that yield extreme stretchability as verified by classical molecular dynamics (MD) simulations. We find that convolutional neural networks (CNN), commonly used for classification in vision tasks, can be applied for regression to achieve an accuracy close to the precision of the MD simulations. This approach can then be used to search for optimal designs that maximize elastic stretchability with only 1000 training data in a large design space of $\sim 4\times10^6$ candidate designs. This example demonstrates the power and potential of ML in finding optimal kirigami designs at a fraction of iterations that would be required of a purely MD or experiment-based approach, where no prior knowledge of the governing physics is known or available. View details
    Realistic Evaluation of Semi-Supervised Learning Algorithms
    Avital Oliver
    Augustus Odena
    Colin Raffel
    Ian Goodfellow
    NeurIPS (Spotlight) (2018)
    Preview abstract Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. Approaches based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks do not reflect real-world requirements and are compared to weak baselines. We propose a set of new benchmarks and find that simple baselines that were previously underappreciated outperform more complicated research ideas that were previously regarded as state of the art. Using our new benchmarking procedures, we additionally find that SSL methods are highly sensitive to the amount of unlabeled data and the class distribution of the data. We encourage researchers studying SSL to adopt our improved methodology, and suggest readers and reviewers of SSL papers to familiarize themselves with the experimental design concerns we identify. View details
    Preview abstract It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we argue that the origin of adversarial examples is primarily due to an inherent uncertainty that neural networks have about their predictions. We show that the functional form of this uncertainty is independent of architecture, dataset, and training protocol; and depends only on the statistics of the logit differences of the network, which do not change significantly during training. This leads to adversarial error having a universal scaling, as a power-law, with respect to the size of the adversarial perturbation. We show that this universality holds for a broad range of datasets (MNIST, CIFAR10, ImageNet, and random data), models (including state-of-theart deep networks, linear models, adversarially trained networks, and networks trained on randomly shuffled labels), and attacks (FGSM, step l.l., PGD). Motivated by these results, we study the effects of reducing prediction entropy on adversarial robustness. Finally, we study the effect of network architectures on adversarial sensitivity. To do this, we use neural architecture search with reinforcement learning to find adversarially robust architectures on CIFAR10. Our resulting architecture is more robust to white and black box attacks compared to previous attempts. View details
    Structure-property relationships from universal signatures of plasticity in disordered solids
    Robert Ivancic
    Samuel S. Schoenholz
    Danny Strickland
    Anindita Basu
    Zoey Davidson
    Julien Fontaine
    Jyo Lyn Hor
    Yun-Ru Huang
    Y. Jiang
    Nathan Keim
    K. D. Koshigan
    J. A. Lefever
    T. Liu
    X. -G. Ma
    D. J. Magagnosc
    E. Morrow
    C. P. Ortiz
    J. M. Rieser
    A. Shavit
    T. Still
    Y. Xu
    Y. Zhang
    Kerstin N. Nordstrom
    Paulo E. Arratia
    Robert W. Carpick
    Douglas J. Durian
    Zahra Fakhraai
    Douglas J. Jerolmack
    Daeyoon Lee
    Ju Li
    Robert Riggleman
    Kevin T. Turner
    Arjun G. Yodh
    Daniel S. Gianola
    Andrea J. Liu
    Science (2017)
    Preview abstract When deformed beyond their elastic limits, crystalline solids flow plastically via particle rearrangements localized around structural defects. Disordered solids also flow, but without obvious structural defects. We link structure to plasticity in disordered solids via a microscopic structural quantity, “softness,” designed by machine learning to be maximally predictive of rearrangements. Experimental results and computations enabled us to measure the spatial correlations and strain response of softness, as well as two measures of plasticity: the size of rearrangements and the yield strain. All four quantities maintained remarkable commonality in their values for disordered packings of objects ranging from atoms to grains, spanning 7 to 13 orders of magnitude in diameter and elastic modulus. These commonalities suggest that the spatial correlations and strain response of softness correspond to rearrangement size and yield strain, respectively. View details
    No Results Found