amanda casari
Amanda Casari is a researcher and engineer in the Google Open Source Programs Office co-leading research and open data projects to better understand risk and resilience in open source ecosystems. For over 18 years, she has worked in a breadth of cross-functional roles and engineering disciplines, including developer relations, data science, complexity science, and robotics. Amanda co-authored an O'Reilly book, Feature Engineering for Machine Learning Principles and Techniques for Data Scientists. She was named an External Faculty member of the Vermont Complex Systems Center in 2021. She is persistently fascinated by the difference between the systems we aim to create and the ones that emerge, and pie.
Research Areas
Authored Publications
Sort By
Invisible Labor in Open Source Software Ecosystems
John Meluso
Milo Trujillo
Proceedings of the ACM on Human-Computer Interaction (2025), pp. 1-32
Preview abstract
Invisible labor is work that is not fully visible, not appropriately compensated, or both. In open source software (OSS) ecosystems, essential tasks that do not involve code (like content moderation) often become invisible to the detriment of individuals and organizations. However, invisible labor is so difficult to measure that we do not know how much of OSS activities are invisible. Our study addresses this challenge, demonstrating that roughly half of OSS work is invisible. We do this by developing a survey technique with cognitive anchoring that measures OSS developer self-assessments of labor visibility and attribution. Survey respondents (n = 142) reported that their work is more likely to be non-visible or partially visible (i.e. visible to at most 1 other person) than fully visible (i.e. visible to 2 or more people). Furthermore, cognitively anchoring participants to the idea of high work visibility increased perceptions of labor visibility and decreased visibility importance compared to anchoring to low work visibility. This suggests that advertising OSS activities as “open” may not make labor visible to most people, but rather lead contributors to overestimate labor visibility. We therefore add to a growing body of evidence that designing systems that recognize all kinds of labor as legitimate contributions is likely to improve fairness in software development while providing greater transparency into work designs that help organizations and communities achieve their goals.
View details
Preview abstract
Invisible labor is work that is either not fully visible or not appropriately compensated. In open source software (OSS) ecosystems, essential tasks that do not involve code (like content moderation) often become invisible to the detriment of individuals and organizations. However, invisible labor is sufficiently difficult to measure that we do not know how much of OSS activities are invisible. Our study addresses this challenge, demonstrating that roughly half of OSS work is invisible. We do this by developing a cognitive anchoring survey technique that measures OSS developer self-assessments of labor visibility and attribution. Survey respondents (n=142) reported that their work is more likely to be invisible (2 in 3 tasks) than visible, and that half (50.1%) is uncompensated. Priming participants with the idea of visibility caused participants to think their work was more visible, and that visibility was less important, than those primed with invisibility. We also found evidence that tensions between attribution motivations probably increase how common invisible labor is. This suggests that advertising OSS activities as "open" may lead contributors to overestimate how visible their labor actually is. Our findings suggest benefits to working with varied stakeholders to make select, collectively valued activities visible, and increasing compensation in valued forms (like attribution, opportunities, or pay) when possible. This could improve fairness in software development while providing greater transparency into work designs that help organizations and communities achieve their goals.
View details
Preview abstract
Industry + scientific researchers using data collected from open source ecosystems need better guidelines and best practices to responsibly work with communities.
When researchers fail to consider the human element of open source, it harms open source ecosystems, such as:
- increasing work and emotional stress on volunteer groups
- impacting costly infrastructure systems not designed to support research use cases
- treating critical open source systems as test beds for scholarly research into known problems without consent of the community or contributing back to correct these problems
This article presents best practices and guidelines for researchers working in open source to have a greater impact while respecting the production and sociotechnical environment they are observing.
View details
The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories
James P. Bagrow
Jean-Gabriel Young
Laurent Hébert-Dufresne
Melanie Warrick
Samuel F. Rosenblatt
MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, United States (2022)
Preview abstract
Communication surrounding the development of an open source project largely occurs outside the software repository itself. Historically, large communities often used a collection of mailing lists to discuss the different aspects of their projects. Multimodal tool use, with software development and communication happening on different channels, complicate the study of open source projects as a sociotechnical system. Here, we combine and standardize mailing lists of the Python community, resulting in 954,287 messages from 1995 to the present. We share all scraping and cleaning code to facilitate reproduction of this work, as well as smaller datasets for the Golang (122,721 messages), Angular (20,041 messages) and Node.js (12,514 messages) communities. To showcase the usefulness of these data, we focus on the CPython repository and merge the technical layer (which GitHub account works on what file and with whom) with the social layer (messages from unique email addresses) by identifying 50% of GitHub contributors in the mailing list data. We then explore correlations between the valence of social messaging and the structure of the collaboration network. We discuss how these data provide a laboratory to test theories from standard organizational science in large open source projects.
View details
Which contributions count? Analysis of attribution in open source
James P. Bagrow
Jean-Gabriel Young
Laurent Hébert-Dufresne
Milo Z. Trujillo
2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE (2021), pp. 242-253
Preview abstract
Open source software projects usually acknowledge contributions with text files, websites, and other idiosyncratic methods. These data sources are hard to mine, which is why contributorship is most frequently measured through changes to repositories, such as commits, pushes, or patches. Recently, some open source projects have taken to recording contributor actions with standardized systems; this opens up a unique opportunity to understand how community-generated notions of contributorship map onto codebases as the measure of contribution. Here, we characterize contributor acknowledgment models in open source by analyzing thousands of projects that use a model called All Contributors to acknowledge diverse contributions like outreach, finance, infrastructure, and community management. We analyze the life cycle of projects through this model's lens and contrast its representation of contributorship with the picture given by other methods of acknowledgment, including GitHub's top committers indicator and contributions derived from actions taken on the platform. We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible, which generates a more extensive picture of collaboration. Further, we find that models requiring explicit attribution lead to more clearly defined boundaries around what is and what is not a contribution.
View details
Open Source Ecosystems Need Equitable Credit Across Contributions
James P. Bagrow
Jean-Gabriel Young
Laurent Hébert-Dufresne
Milo Z. Trujillo
Nature Computational Science, 1 (2021)
Preview abstract
Collaborative and creative communities are more equitable when all contributions to a project are acknowledged. Equitable communities are, in turn, more transparent, more accessible to newcomers, and more encouraging of innovation—hence we should foster these communities, starting with proper attribution of credit. However, to date, no standard and comprehensive contribution acknowledgement system exists in open source, not just for software development but for the broader ecosystems of conferences, organization and outreach, and technical knowledge. As a result, billions of dollars of corporate sponsorship and employee labor are invested in open source software projects without knowing whom the investments support and where they have impact. Further, both closed and open source projects are built on a complex web of open source dependencies, and we lack a nuanced understanding of who creates and maintains these projects.
View details