Data Analysts and Their Software Practices:A Profile of the Sabermetrics Community and Beyond

Justin Middleton
Emerson Murphy-Hill
Kathryn T. Stolee
Computer Supported Cooperative Work (2020)

Abstract

In modern data analytics, practices from software development are increasingly necessary to manage data,but they must be incorporated alongside other statistical and scientific skills. Therefore, we ask: how does a community recontextualize and reinterpret software development through the unique pressures of their work? To answer this, we explore the data-centric community around baseball analytics, or sabermetrics. To discover development’s place in the search for robust statistical insight in sports, we interview 10 participants in the sabermetric community and survey over 120 more data analysts, both in baseball and not. We explore how their work lives at the intersection of science and entertainment, and as a consequence, baseball data serves as an accessible yet deep subject to practice analytic skills. Software development exists within an iterative research process that cycles between defining rigorous statistical methods and preserving the flexibility to chase interesting problems. In this question-driven process, members of the community inhabit several overlapping roles of intentional work, in which software development can become the priority to support research and statistical infrastructure, and we discuss the way that the community can foster the balance of these skills