Doing Data Science with coLaboratory

August 8, 2014

Posted by Kayur Patel, Kester Tong, Mark Sandler, and Corinna Cortes, Google Research



Building products and making decisions based on data is at the core of what we do at Google. Increasingly common among fields such as journalism and government, this data-driven mindset is changing the way traditionally non-technical organizations do work. In order to bring this approach to even more fields, Google Research is excited to be a partner in the coLaboratory project, a new tool for data science and analysis, designed to make collaborating on data easier.

Created by Google Research, Matthew Turk (creator of the yt visualization package), and the IPython/Jupyter development team, coLaboratory merges successful open source products with Google technologies, enabling multiple people to collaborate directly through simultaneous access and analysis of data. This provides a big improvement over ad-hoc workflows involving emailing documents back and forth.

Setting up an environment for collaborative data analysis can be a hurdle, as requirements vary among different machines and operating systems, and installation errors can be cryptic. The coLaboratory Chrome App addresses this hurdle. One-click installs coLaboratory, IPython, and a large set of popular scientific python libraries (with more on the way). Furthermore, because we use Portable Native Client (PNaCl), coLaboratory runs at native speeds and is secure, allowing new users to start working with IPython faster than ever.

In addition to ease of installation, coLaboratory enables collaboration between people with different skill sets. One example of this would be interactions between programmers who write complex logic in code and non-programmers who are more familiar with GUIs. As shown below, a programmer writes code (step 1) and then annotates that code with simple markup to create an interactive form (step 2). The programmer can then hide the complexity of code to show only the form (step 3), which allows a non-programmer to re-run the code by changing the slider and dropdowns in the form (step 4). This interaction allows programmers to write complex logic in code and allows non-programmers to manipulate that logic through simple GUI hooks.
For more information about this project please see our talks on collaborative data science and zero dependency python. In addition to our external partners in the coLaboratory project, we would like to thank everyone at Google who contributed: the Chromium Native Client team, the Google Drive team, the Open Source team, and the Security team.