Arjun Gopalan

Arjun Gopalan

I am a software engineer at Google Research. My areas of interest include graph-based machine learning, label propagation, and data mining. I currently work on Google's large scale semi-supervised machine learning platform and on Neural Structured Learning in TensorFlow.

Prior to Google, I worked on developing enterprise storage technologies at Tintri for close to 4 years. While at Tintri, I was one of the principal contributors to the design and implementation of logical synchronous replication with automatic transparent failover. A paper on Logical Synchronous Replication appeared in FAST’18.

I completed my Masters in Computer Science with a distinction in research at Stanford University in 2014. At Stanford, I was part of the Platform Lab working with Dr. John Ousterhout on RAMCloud, a low latency DRAM-based distributed data center storage system. A paper on RAMCloud appeared in TOCS’15. My Master’s thesis was on managing objects and secondary indexes in RAMCloud, which was part of a larger effort to design and implement scalable low-latency secondary indices (SLIK) in RAMCloud. A paper on SLIK appeared in ATC'16.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Recognizing Multimodal Entailment (tutorial at ACL 2021)
    Afsaneh Hajiamin Shirazi
    Blaž Bratanič
    Christina Liu
    Gabriel Fedrigo Barcik
    Georg Fritz Osang
    Jared Frank
    Lucas Smaira
    Ricardo Abasolo Marino
    Roma Patel
    Vaiva Imbrasaite
    (2021) (to appear)
    Preview abstract How information is created, shared and consumed has changed rapidly in recent decades, in part thanks to new social platforms and technologies on the web. With ever-larger amounts of unstructured and limited labels, organizing and reconciling information from different sources and modalities is a central challenge in machine learning. This cutting-edge tutorial aims to introduce the multimodal entailment task, which can be useful for detecting semantic alignments when a single modality alone does not suffice for a whole content understanding. Starting with a brief overview of natural language processing, computer vision, structured data and neural graph learning, we lay the foundations for the multimodal sections to follow. We then discuss recent multimodal learning literature covering visual, audio and language streams, and explore case studies focusing on tasks which require fine-grained understanding of visual and linguistic semantics question answering, veracity and hatred classification. Finally, we introduce a new dataset for recognizing multimodal entailment, exploring it in a hands-on collaborative section. Overall, this tutorial gives an overview of multimodal learning, introduces a multimodal entailment dataset, and encourages future research in the topic. View details
    Preview abstract We present Neural Structured Learning (NSL) in TensorFlow, a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph, or implicit, either induced by adversarial perturbation or inferred using techniques like embedding learning. NSL is open-sourced as part of the TensorFlow ecosystem and is widely used in Google across many products and services. In this tutorial, we provide an overview of the NSL framework including various libraries, tools, and APIs as well as demonstrate the practical use of NSL in different applications. The NSL website is hosted at www.tensorflow.org/neural_structured_learning, which includes details about the theoretical foundations of the technology, extensive API documentation, and hands-on tutorials. View details
    Logical Synchronous Replication in the Tintri VMstore File System
    Gideon Glass
    Dattatraya Koujalagi
    Abhinand Palicherla
    Sumedh Sakdeo
    16th USENIX Conference on File and Storage Technologies (FAST 18), {USENIX} Association, Oakland, CA(2018), pp. 295-308
    Preview abstract A standard feature of enterprise data storage systems is synchronous replication: updates received from clients by one storage system are replicated to a remote storage system and are only acknowledged to clients after having been stored persistently on both storage systems. Traditionally these replication schemes require configuration on a coarse granularity, eg on a LUN, filesystem volume, or whole-system basis. In contrast to this, we present a new architecture which operates on a fine granularity---individual files and directories. To implement this, we use a combination of novel per-file capabilities and existing techniques to solve the following problems: tracking parallel writes in flight on independent storage systems; replicating arbitrary filesystem operations; efficiently resynchronizing after a disconnect; and verifying the integrity of replicated data between two storage systems. View details
    SLIK: Scalable Low-Latency Indexes for a Key-Value Store
    Ankita Kejriwal
    Ashish Gupta
    Zhihao Jia
    Stephen Yang
    John Ousterhout
    2016 USENIX Annual Technical Conference (USENIX ATC 16), {USENIX} Association, Denver, CO, pp. 57-70
    Preview abstract Many large-scale key-value storage systems sacrifice features like secondary indexing and/or consistency in favor of scalability or performance. This limits the ease and efficiency of application development on such systems. Implementing secondary indexing in a large-scale memory based system is challenging because the goals for low latency, high scalability, consistency and high availability often conflict with each other. This paper shows how a large-scale key-value storage system can be extended to provide secondary indexes while meeting those goals. The architecture, called SLIK, enables multiple secondary indexes for each table. SLIK represents index B+ trees using objects in the underlying key-value store. It allows indexes to be partitioned and distributed independently of the data in tables while providing reasonable consistency guarantees using a lightweight ordered write approach. Our implementation of this design on RAMCloud (a main memory key-value store) performs indexed reads in 11 μs and writes in 30 μs. The architecture supports indexes spanning thousands of nodes, and provides linear scalability for throughput. View details
    The RAMCloud Storage System
    John Ousterhout
    Ashish Gupta
    Ankita Kejriwal
    Collin Lee
    Behnam Montazeri
    Diego Ongaro
    Seo Jin Park
    Henry Qin
    Mendel Rosenblum
    Stephen Rumble
    Ryan Stutsman
    Stephen Yang
    ACM Trans. Comput. Syst., 33(2015)
    Preview abstract RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers into a single coherent key-value store. RAMCloud ensures the durability of DRAM-based data by keeping backup copies on secondary storage. It uses a uniform log-structured mechanism to manage both DRAM and secondary storage, which results in high performance and efficient memory usage. RAMCloud uses a polling-based approach to communication, bypassing the kernel to communicate directly with NICs; with this approach, client applications can read small objects from any RAMCloud storage server in less than 5μs, durable writes of small objects take about 13.5μs. RAMCloud does not keep multiple copies of data online; instead, it provides high availability by recovering from crashes very quickly (1 to 2 seconds). RAMCloud’s crash recovery mechanism harnesses the resources of the entire cluster working concurrently so that recovery performance scales with cluster size. View details
    Improved bounds for bipartite matching on surfaces
    Samir Datta
    Raghav Kulkarni
    Raghunath Tewari
    STACS'12 (29th Symposium on Theoretical Aspects of Computer Science), {LIPIcs}, Paris, France(2012), pp. 254-265
    Preview abstract We exhibit the following new upper bounds on the space complexity and the parallel complexity of the Bipartite Perfect Matching (BPM) problem for graphs of small genus: (1) BPM in planar graphs is in UL (improves upon the SPL bound from Datta, Kulkarni, and Roy; (2) BPM in constant genus graphs is in NL (orthogonal to the SPL bound from Datta, Kulkarni, Tewari, and Vinodchandran.; (3) BPM in poly-logarithmic genus graphs is in NC; (extends the NC bound for O(log n) genus graphs from Mahajan and Varadarajan, and Kulkarni, Mahajan, and Varadarajan. For Part (1) we combine the flow technique of Miller and Naor with the double counting technique of Reinhardt and Allender . For Part (2) and (3) we extend Miller and Naor's result to higher genus surfaces in the spirit of Chambers, Erickson and Nayyeri. View details