Mayfly: Private Aggregate Insights from Ephemeral Streams of On-Device User Data

Ren Yi
Marco Gruteser
Ollie Guinan
Albert Cheu
Christopher Bian
Yannis Guzman
Edo Roth
Zoe Gong
Maya Spivak
Artem Lagzdin
Stanislav Chiknavaryan
Ryan McKenna
Grace Ni
Timon Van Overveldt
2024

Abstract

This paper introduces Mayfly, a federated analytics approach enabling aggregate queries over ephemeral on-device data streams without central persistence of sensitive user data. Mayfly minimizes data via on-device windowing and contribution bounding through SQL-programmability, anonymizes user data via streaming differential privacy (DP), and mandates immediate in-memory cross-device aggregation on the server -- ensuring only privatized aggregates are revealed to data analysts. Deployed for a sustainability use case estimating transportation carbon emissions from private location data, Mayfly computed over 4 million statistics across more than 500 million devices with a per-device, per-week DP ε=2 while meeting strict data utility requirements. To achieve this, we designed a new DP mechanism for Group-By-Sum workloads leveraging statistical properties of location data, with potential applicability to other domains.
×