Google at SIGMOD/PODS 2012

July 13, 2012

Over the years, SIGMOD has expanded beyond a traditional "database" conference to include several areas related to information management. This year’s ACM SIGMOD/PODS conference (on Management of Data, and Principles of Database Systems), held in Scottsdale, Arizona was no different. We were impressed by the wide variety of researchers from industry and academia alike the conference attracted, and enjoyed learning how others are pushing the limits of scalability in data storage and processing. In addition to an excellent set of papers on a large number of topics, we saw a couple of recurring themes:

1) Data Visualization
  • Pat Hanrahan from Stanford gave a keynote on some of the challenges involved in building systems to enable "data enthusiasts" to manage and visualize data. 

2) Big Data

As has been the case for the last couple of years, “Big Data" has been of ever-growing interest to the entire community, particularly from industry. Google presented a talk on F1, a new distributed database system we’ve built to power the AdWords system. A complex business application like AdWords has different requirements than many systems at Google that often use storage systems like Bigtable. We have a single database shared by hundreds of developers and systems, so we need the robustness and ease of use we’re used to from traditional databases. F1 is built to scale like Bigtable, without giving up the database features we also need, like strong consistency, ACID transactions, schema enforcement, and most importantly, SQL query.

There’s been a widespread trend over the last several years away from databases, towards highly scalable “NoSQL” systems. We don’t think that trade-off is necessary, and were happy to see several other speakers advocate a similar theme -- yes, databases are useful, and developers shouldn’t need to give up database features and ease of use in the name of scalability.

This theme was supported by an industry session on Big Data featuring talks from other companies: Facebook (TAO: How Facebook Serves the Social Graph), Twitter (Large-Scale Machine Learning at Twitter), and Microsoft (Recurring Job Optimization in Scope). Googler Kirsten LeFevre was a panelist on the "Perspectives on Big Data" panel organized by Surajit Chaudhuri from Microsoft, and also featuring Donald Kossmann from ETHZ, Sam Madden from MIT, and Anand Rajaraman from Walmart Labs. Last but not the least, Surajit Chaudhuri also gave an excellent keynote outlining some of the research challenges that the new era of "Big Data and Cloud" poses.

As has been the practice for several years now, to continue generating great interest in data management research, SIGMOD has been organizing panels such as this year's "New Research Symposium" (which included Anish Das Sarma from Google as a panelist).

In addition to sponsoring the conference, many Googlers attended contributing to a robust presence and affording us the opportunity to interact with the broader information management community. We've been pushing the frontiers of science with cutting-edge research in many aspects of data management, and we were eager to share our innovations and see what others have been working on. We found Amin Vahdat's keynote on the intersection of Networking and Databases to be a highlight of Google’s participation, which also included presenting papers, participating on panels, and taking part in planning and program committees:

Program Committee Members

Anish Das Sarma, Venkatesh Ganti, Zoltan Gyongyi, Alon Halevy (Tutorials Chair), Kristen LeFevre, Cong Yu


Symbiosis in Scale Out Networking and Data Management
Amin Vahdat, Google (Keynote)

F1-The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business
Jeff Shute, Mircea Oancea, Stephan Ellner, Ben Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong (Googlers)

Finding Related Tables
Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu (Googlers)


CloudRAMSort: Fast and Efficient Large-Scale Distributed RAM Sort on Shared-Nothing Cluster
Changkyu Kim, Jongsoo Park, Nadathur Satish, Hongrae Lee (Google), Pradeep Dubey, Jatin Chhugani

Efficient Spatial Sampling of Large Geographical Tables
Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy (Googlers)


Perspectives on Big Data Plenary Session: Privacy and Big Data 
Kristen LeFevre, Google

SIGMOD New Researcher Symposium - How to be a good advisor/advisee? 
Anish Das Sarma, Google

Overall, this year’s SIGMOD was a great conference, widely attended by researchers from industry and academia, and comprised of a very interesting mix of research presentations and discussions. Google had a good showing at the conference, and we look forward to continuing this trend in the coming years.