Tolga Bolukbasi

Tolga Bolukbasi

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We explore a fundamental question in language model pre-training with huge amounts of unlabeled and randomly sampled text data - should every data sample have equal contribution to the model learning. To this end, we use self-influence (SI) scores as an indicator of sample importance, analyzing the relationship of self-influence scores with the sample quality and probing the efficacy of SI scores for offline pre-training dataset filtering. Building upon this, we propose PRESENCE: Pre-training data REweighting with Self-influENCE, an online and adaptive pre-training data re-weighting strategy using self-influence scores. PRESENCE is a two-phased learning method: In the first phase of learning, the data samples with higher SI scores are emphasized more, while in the subsequent phase of learning, the data samples with higher SI scores are de-emphasized to limit the impact of noisy and unreliable samples. We validate PRESENCE over $2$ model sizes of multilingual-t5 with $5$ datasets across $3$ tasks, obtaining significant performance improvements over the baseline methods considered. Through extensive ablations and qualitative analyses, we put forward a new research direction for language model pre-training. View details
    Guided Integrated Gradients: An Adaptive Path Method for Removing Noise
    Besim Namik Avci
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5050-5058
    Preview abstract Integrated Gradients (IG) is a commonly used feature attribution method for deep neural networks. While IG has many desirable properties, when applied to visual models, the method often produces spurious/noisy pixel attributions in regions that are not related to the predicted class. While this has been previously noted, most existing solutions are aimed at addressing the symptoms by explicitly reducing the noise in the resulting attributions. In this work, we show that one of the causes of the problem is the presence of "adversarial examples'' along the IG path. To minimize the effect of adversarial examples on attributions, we propose adapting the attribution path itself. We introduce Adaptive Path Methods (APMs), as a generalization of path methods, and Guided IG as a specific instance of an APM. Empirically, Guided IG creates saliency maps better aligned with the model's prediction and the input image that is being explained. We show through qualitative and quantitative experiments that Guided IG outperforms IG on ImageNet, Open Images, and diabetic retinopathy medical images. View details
    The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models
    Andy Coenen
    Sebastian Gehrmann
    Ellen Jiang
    Carey Radebaugh
    Ann Yuan
    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics (to appear)
    Preview abstract We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis. We include case studies for a diverse set of workflows, including exploring counterfactuals for sentiment analysis, measuring gender bias in coreference systems, and exploring local behavior in text generation. LIT supports a wide range of models--including classification, seq2seq, and structured prediction--and is highly extensible through a declarative, framework-agnostic API. LIT is under active development, with code and full documentation available at https://github.com/pair-code/lit. View details
    Debiasing Embeddings for Fairer Text Classification
    1st ACL Workshop on Gender Bias for Natural Language Processing (2019)
    Preview abstract (Bolukbasi et al., 2016) demonstrated that pre-trained word embeddings can inherit gender bias from the data they were trained on. We investigate how this bias affects downstream classification tasks, using the case study of occupation classification (De-Arteaga et al.,2019). We show that traditional techniques for debiasing embeddings can actually worsen the bias of the downstream classifier by providing a less noisy channel for communicating gender information. With a relatively minor adjustment, however, we show how these same techniques can be used to simultaneously reduce bias and obtain high classification accuracy. View details
    Preview abstract A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, analyze the importance of different data features, and visualize model behavior across multiple models and subsets of input data. It also lets practitioners measure systems according to multiple ML fairness metrics. We describe the design of the tool, and report on real-life usage at different organizations. View details
    Preview abstract Saliency methods can aid understanding of deep neural networks. Recent years have witnessed many improvements to saliency methods, as well as new ways for evaluating them. In this paper, we 1) present a novel region-based attribution method, XRAI, that builds upon integrated gradients (Sundararajan et al. 2017), 2) introduce evaluation methods for empirically assessing the quality of image-based saliency maps (Performance Information Curves (PICs)), and 3) contribute an axiom-based sanity check for attribution methods. Through empirical experiments and example results, we show that XRAI produces better results than other saliency methods for common models and the ImageNet dataset. View details