Data Mining and Modeling

The proliferation of machine learning means that learned classifiers lie at the core of many products across Google. However, questions in practice are rarely so clean as to just to use an out-of-the-box algorithm. A big challenge is in developing metrics, designing experimental methodologies, and modeling the space to create parsimonious representations that capture the fundamentals of the problem. These problems cut across Google’s products and services, from designing experiments for testing new auction algorithms to developing automated metrics to measure the quality of a road map.

Data mining lies at the heart of many of these questions, and the research done at Google is at the forefront of the field. Whether it is finding more efficient algorithms for working with massive data sets, developing privacy-preserving methods for classification, or designing new machine learning approaches, our group continues to push the boundary of what is possible.

Recent Publications

Efficient Location Sampling Algorithms for Road Networks

Sara Ahmadian

Sreenivas Gollapudi

Kostas Kollias

Vivek Kumar

Ameya Velingker

Santhoshini Velusamy

WebConf (2024)

Understanding Documentation Usage Through Log Analysis: An Exploratory Case Study of Four Cloud Services

Daye Nam

Andrew Macvean

Brad A. Myers

Bogdan Vasilescu

TBD: Target is ICSE 2023 (2024)

City-wide Probe-based Study of Traffic Variability

Avinatan Hassidim

Dotan Emanuel

Ori Rottenstreich

COMSNETS 2024, https://www.comsnets.org/ (2024)

First Passage Percolation with Queried Hints

Sreenivas Gollapudi

Kritkorn Karntikoon

Kostas Kollias

Aaron Schild

Yiheng Shen

Ali Sinop

AISTATS (2024)

Shorts vs. Regular Videos on YouTube: A Comparative Analysis of User Engagement and Content Creation Trends

Caroline Violot

Tugrulcan Elmais

Igor Bilogrevic

Mathias Humbert

ACM Web Science Conference 2024 (WEBSCI24) (2024)

LinguaMeta: Unified Metadata for Thousands of Languages

Sandy Ritchie

Daan van Esch

Uche Okonkwo

Shikhar Vashishth

Emily Drummond

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Mining and Modeling

Recent Publications

Some of our teams

Join us