I am a data standards technologist within Google's Open Source Programs Office, where I work on our use of structured data formats and knowledge graph exchange technologies. I am co-lead of the Schema.org project, where we try to bring these technologies into the lives of all users of the Web - regardless of whether they want to fact check their politicians, plan a meal, or find a job. Prior to joining Google, I worked largely on the creation and use of Web data standards, having joined the W3C RDF initiative in its early phase and having helped create the Semantic Web and Linked Data fields (RDF/S, FOAF, SKOS etc.). My work has always been in communities engaged with open standards, open source and data sharing, most recently investigating open data and open source approaches to AI in the 'Artificial Life' tradition. I remain active in the standards-making communities around W3C, and in various collaborative projects.
    There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments,scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others' work, and providing data journalists easier access to information and its provenance. In this paper, we discuss Google Dataset Search, a dataset-discovery tool that provides search capabilities over potentially all datasets published on the Web. The approach relies on an open ecosystem,where dataset owners and providers publish semantically enhanced metadata on their own sites. We then aggregate, normalize, and reconcile this metadata, providing a search engine that lets users find datasets in the "long tail" of the Web. In this paper, we discuss both social and technical challenges in building this type of tool,and the lessons that we learned from this experience.
