Sercan O. Arik

Sercan O. Arik

Sercan Arik is a Research Scientist at Google Cloud AI. Motivated by the mission of democratizing AI and bringing it to the most impactful use cases (from Healthcare, Finance, Retail, Media, Education, Communications and many other industries), he works on making AI high-performance for the most-demanded data types, interpretable, fair, data-efficient, robust and reliable. Before joining Google, he was a Research Scientist at Baidu Silicon Valley AI Lab. At Baidu, he focused on deep learning research, particularly for applications in human-technology interfaces. He co-developed state-of-the-art speech synthesis, keyword spotting, voice cloning, and neural architecture search systems. Prior to Baidu, he completed a PhD degree in Electrical Engineering at Stanford University in 2016. He has co-authored more than 50 journal and conference publications.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependence on humans, which can be difficult and expensive. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. Selective prediction and active learning have been approached from different angles, with the connection between them missing. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new paradigm, we propose a simple yet effective approach, ASPEST, that utilizes ensembles of model snapshots with self-training with their aggregated outputs as pseudo labels. Extensive experiments on numerous image, text and structured datasets, which suffer from domain shifts, demonstrate that ASPEST can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST→SVHN benchmark with the labeling budget of 100, ASPEST improves the AUACC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop. View details
    Preview abstract Text-to-SQL aims to automate the process of generating SQL queries on a database from natural language text. In this work, we propose "SQLPrompt", tailored to improve the few-shot prompting capabilities of Text-to-SQL for Large Language Models (LLMs). Our methods include innovative prompt design, execution based consistency decoding strategy which selects the SQL with the most consistent execution outcome among other SQL proposals, and a method that aims to improve performance by diversifying the SQL proposals during consistency selection with different prompt designs ("MixPrompt") and foundation models ("MixLLMs"). We show that SQLPrompt outperforms previous approaches for in-context learning with few labeled data by a large margin, closing the gap with finetuning state-of the-art with thousands of labeled data. View details
    SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch
    Chun-Liang Li
    Kihyuk Sohn
    Transactions on Machine Learning Research (TMLR)(2023)
    Preview abstract Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications -- for example, the labeled data may contain only anomalies unlike unlabeled data, or unlabeled data may contain different types of anomalies, or labeled data may contain only `easy-to-label' samples. SPADE utilizes an ensemble of one class classifiers as the pseudo-labeler to improve the robustness of pseudo-labeling with distribution mismatch. Partial matching is proposed to automatically select the critical hyper-parameters for pseudo-labeling without validation data, which is crucial with limited labeled data. SPADE shows state-of-the-art semi-supervised anomaly detection performance across a wide range of scenarios with distribution mismatch in both tabular and image domains. In some common real-world settings such as model facing new types of unlabeled anomalies, SPADE outperforms the state-of-the-art alternatives by 5% AUC in average. View details
    Preview abstract In this paper, we propose a novel deep sequence model based on the Koopman theory for time series forecasting with distribution shifts. Our model, Koopman Neural Forecaster (KNF), leverages DNNs to learn the linear Koopman space and the measurement functions, and imposes inductive biases for improved robustness against distributional shifts. KNF employs both a global operator to learn shared characteristics, and a local operator to capture changing dynamics. KNF also includes a judiciously-designed feedback loop to continuously update the learnt operators over time for rapidly varying behaviors. To the best of our knowledge, this is the first time that Koopman theory is applied to real-world time series without known governing laws. We demonstrate that KNF achieves the state-of-the-art performance on wide range of time series datasets that are particularly known to suffer from distribution shifts. View details
    Preview abstract Real-world time-series datasets are often multivariate with complex dynamics. To capture this complexity, high capacity architectures like recurrent- or attention-based sequential deep learning models have become popular. However, recent work demonstrates that simple univariate linear models can outperform such deep learning models on several commonly used academic benchmarks. Extending them, in this paper, we investigate the capabilities of linear models for time-series forecasting and present Time-Series Mixer (TSMixer), a novel architecture designed by stacking multi-layer perceptrons (MLPs). TSMixer is based on mixing operations along both the time and feature dimensions to extract information efficiently. On popular academic benchmarks, the simple-to-implement TSMixer is comparable to specialized state-of-the-art models that leverage the inductive biases of specific benchmarks. On the challenging and large scale M5 benchmark, a real-world retail dataset, TSMixer demonstrates superior performance compared to the state-of-the-art alternatives. Our results underline the importance of efficiently utilizing cross-variate and auxiliary information for improving the performance of time series forecasting. We present various analyses to shed light into the capabilities of TSMixer. The design paradigms utilized in TSMixer are expected to open new horizons for deep learning-based time series forecasting. The implementation is available at: https://github.com/google-research/google-research/tree/master/ tsmixer . View details
    Preview abstract Multimodal large-scale pretraining has shown impressive performance gains for unstructured data including language, image, audio, and video. Yet, the scenario prominent in real-world applications is the existence of combination of structured (including tabular and time-series) and unstructured data in conjunction, and it has been understudied. Towards this end, we propose LANISTR, a novel attention-based framework to learn from LANguage, Image, and STRuctured data. We introduce a new multimodal fusion module with a similarity-based multimodal masking loss that enables LANISTR to learn cross-modal relations from large-scale multimodal data with missing modalities during training and test time. On two publicly available MIMIC-IV and Amazon Product Review datasets, LANISTR achieves absolute improvements of 6.47% (AUROC) and 8.35% (accuracy), respectively, compared to the state-of-the-art multimodal models, while showing superior generalization capabilities. View details
    Preview abstract Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions when they are unsure of the answer. In this work, we propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of LLMs. Our framework is based on the idea of using parameter-efficient tuning to adapt the LLM to the specific task at hand while improving its ability to perform self-evaluation. We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods. For example, on the CoQA benchmark, our method improves the AUACC from 91.23% to 92.63% and improves the AUROC from 74.61% to 80.25%. View details
    Preview abstract Modern large language models (LLMs) have demonstrated impressive capabilities at sophisticated tasks, often through step-by-step reasoning similar to humans. This is made possible by their strong few-shot and zero shot abilities: they either learn from a handful of handcrafted, completed responses (“in context examples”), or are prompted to reason spontaneously through specially designed triggers. Nonetheless, few-shot performance is sensitive to the choice of the examples, for which artisanal hand-crafted selection would require extensive effort, and in some cases, it might not even be possible to obtain relevant examples a-priori without expertise about the downstream tasks. On the other hand, most general and handcrafting-free, zero-shot performance is limited by the lack of guidance to the LLM. To address this, we propose Consistency-based Self-adaptive Prompting (COSP), a novel prompt design method for LLMs. Requiring neither handcrafted responses nor ground-truth labels, COSP selects & builds the set of examples from the LLM’s own zero-shot outputs via carefully designed criteria combining consistency, diversity and repetition. In zero-shot setting, with only LLM predictions, COSP significantly improves performance (up to 2× compared to zero-shot baselines and matching or exceeding few-shot baselines) in a range of reasoning tasks in 3 LLMs. Moreover, COSP can be generalized to few-shot setting and can take advantage of few labeled examples in an efficient way View details
    Preview abstract Accurate estimation of output quantiles is crucial in many use cases, where it is desired to model the range of possibility. Modeling target distribution at arbitrary quantile levels and at arbitrary input attribute levels are important to offer a comprehensive picture of the data, and requires the quantile function to be expressive enough. The quantile function describing the target distribution using quantile levels is critical for quantile regression. Althought various parametric forms for the distributions (that the quantile function specifies) can be adopted, an everlasting problem is selecting the most appropriate one that can properly approximate the data distributions. In this paper, we propose a non-parametric and data-driven approach, Neural Spline Search (NSS), to represent the observed data distribution without parametric assumptions. NSS is flexible and expressive for modeling data distributions by transforming the inputs with a series of monotonic spline regressions guided by symbolic operators. We demonstrate that NSS outperforms previous methods on synthetic, real-world regression and time-series forecasting tasks. View details
    Preview abstract We propose a canonical approach for feature selection, sparse learnable masks (SLM). SLM integrates learnable sparse masks into end-to-end training. For the fundamental non-differentiability challenge of selecting a desired number of features, we propose duo mechanisms for automatic mask scaling to achieve the desired feature sparsity, and gradually tempering this sparsity for effective learning. In addition, SLM employs a novel objective that maximizes the mutual information (MI) between the selected features and the labels, in an efficient and scalable way. Empirically, SLM achieves state-of-the-art results on several benchmark datasets, often by a significant margin, especially on real-world challenging datasets. View details