The
23rd ACM conference on Knowledge Discovery and Data Mining (KDD’17), a main venue for academic and industry research in data science, information retrieval, data mining and machine learning, was held last week in Halifax, Canada. Google has historically been an active participant in KDD, and this year was no exception, with Googlers’ contributing numerous papers and participating in workshops.
In addition to our overall participation, we are happy to congratulate fellow Googler Bryan Perozzi for receiving the SIGKDD 2017 Doctoral Dissertation Award, which serves to recognize excellent research by doctoral candidates in the field of data mining and knowledge discovery. This award was given in recognition of his
thesis on the topic of machine learning on graphs performed at Stony Brook University, under the advisorship of
Steven Skiena. Part of his thesis was developed during his internships at Google. The thesis dealt with using a restricted set of local graph primitives (such as ego-networks and truncated random walks) to effectively exploit the information around each vertex for
classification,
clustering, and
anomaly detection. Most notably, the work introduced the random-walk paradigm for graph embedding with neural networks in DeepWalk.
DeepWalk: Online Learning of Social Representations, originally presented at KDD'14, outlines a method for using a series of local information obtained from truncated random walks to learn
latent representations of nodes in a graph (e.g. users in a social network). The core idea was to treat each segment of a random walk as a sentence “in the language of the graph.” These segments could then be used as input for neural network models to learn representations of the graph’s nodes, using sequence modeling methods like
word2vec (which had just been developed at the time). This research continues at Google, most recently with
Learning Edge Representations via Low-Rank Asymmetric Projections.
The full list of Google contributions at KDD’17 is listed below (Googlers highlighted in
blue).
Organizing CommitteePanel Chair:
Andrew Tomkins Research Track Program Chair:
Ravi Kumar Applied Data Science Track Program Chair:
Roberto J. Bayardo Research Track Program Committee:
Sergei Vassilvitskii, Alex Beutel, Abhimanyu Das, Nan Du, Alessandro Epasto, Alex Fabrikant, Silvio Lattanzi, Kristen Lefevre, Bryan Perozzi, Karthik Raman, Steffen Rendle, Xiao YuApplied Data Science Program Track Committee:
Edith Cohen, Ariel Fuxman, D. Sculley, Isabelle Stanton, Martin Zinkevich, Amr Ahmed, Azin Ashkan, Michael Bendersky, James Cook, Nan Du, Balaji Gopalan, Samuel Huston, Konstantinos Kollias, James Kunz, Liang Tang, Morteza ZadimoghaddamAwardsDoctoral Dissertation Award:
Bryan Perozzi, for
Local Modeling of Attributed Graphs: Algorithms and Applications.
Doctoral Dissertation Runner-up Award:
Alex Beutel, for
User Behavior Modeling with Large-Scale Graph Analysis.
PapersEgo-Splitting Framework: from Non-Overlapping to Overlapping ClustersAlessandro Epasto, Silvio Lattanzi, Renato Paes LemeHyperLogLog Hyperextended: Sketches for Concave Sublinear Frequency StatisticsEdith CohenGoogle Vizier: A Service for Black-Box OptimizationDaniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, D. SculleyQuick Access: Building a Smart Experience for Google DriveSandeep Tata, Alexandrin Popescul, Marc Najork, Mike Colagrosso, Julian Gibbons, Alan Green, Alexandre Mah, Michael Smith, Divanshu Garg, Cayden Meyer, Reuben KanPapersTFX: A TensorFlow Based Production Scale Machine Learning PlatformDenis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, Martin ZinkevichConstruction of Directed 2K GraphsBalint Tillman, Athina Markopoulou, Carter T. Butts, Minas GjokaA Practical Algorithm for Solving the Incoherence Problem of Topic Models In Industrial Applications Amr Ahmed, James Long, Dan Silva, Yuan WangTrain and Distribute: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks Heng-Tze Cheng, Lichan Hong, Mustafa Ispir, Clemens Mewald, Zakaria Haque, Illia Polosukhin, Georgios Roumpos, D Sculley, Jamie Smith, David Soergel, Yuan Tang, Philip Tucker, Martin Wicke, Cassandra Xia, Jianwei XieLearning to Count Mosquitoes for the Sterile Insect TechniqueYaniv Ovadia, Yoni Halpern, Dilip Krishnan, Josh Livni, Daniel Newburger, Ryan Poplin, Tiantian Zha, D. SculleyWorkshops13th International Workshop on Mining and Learning with GraphsKeynote Speaker:
Vahab Mirrokni - Distributed Graph Mining: Theory and PracticeContributed talks include:
HARP: Hierarchical Representation Learning for NetworksHaochen Chen, Bryan Perozzi, Yifan Hu and Steven SkienaFairness, Accountability, and Transparency in Machine LearningContributed talks include:
Fair Clustering Through Fairlets Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei VassilvitskiiData Decisions and Theoretical Implications when Adversarially Learning Fair RepresentationsAlex Beutel, Jilin Chen, Zhe Zhao, Ed H. ChiTutorialTensorFlowRajat Monga, Martin Wicke, Daniel ‘Wolff’ Dobson, Joshua Gordon