Addressing Label Sparsity with Class-Level Common Sense for Google Maps
Abstract
Successful knowledge graphs (KGs) solved the historical knowledge acquisition bottleneck by supplanting an expert focus with a simple, crowd-friendly one: KG nodes represent popular people, places, organizations, etc., and the graph arcs represent common sense relations like affiliations, locations, etc. Techniques for more general, categorical, KG curation do not seem to have made the same transition: the KG research community is still largely focused on logic-based methods that belie the common-sense characteristics of successful KGs.
In this paper, we propose a simple yet novel approach to acquiring \emph{class-level attributes} from the crowd that represent broad common sense associations between categories, and can be used with the classic knowledge-base default \& override technique (e.g. \cite{reiter1978}) to address the early \textit{label sparsity problem} faced by machine learning systems for problems that lack data for training. We demonstrate the effectiveness of our acquisition and reasoning approach on a pair of very real industrial-scale problems: how to augment an existing KG of places and offerings (e.g. stores and products, restaurants and dishes) with associations between them indicating the availability of the offerings at those places, which would enable the KG to provide answers to questions like, ``Where can I buy milk nearby?'' This problem has several practical challenges but for this paper we focus mostly on the label sparsity. Less than 30\% of physical places worldwide (i.e. brick \& mortar stores and restaurants) have a website, and less than half of those list their product catalog or menus, leaving a large acquisition gap to be filled by methods other than information extraction (IE). Label sparsity is a general problem, and not specific to these use cases, that prevents modern AI and machine learning techniques from applying to many applications for which labeled data is not readily available. As a result, the study of how to acquire the knowledge and data needed for AI to work is as much a problem today as it was in the 1970s and 80s during the advent of expert systems \cite{mycin1975}.
The class-level attributes approach presented here is based on a KG-inspired intuition that a lot of the knowledge people need to understand where to go to buy a product they need, or where to find the dishes they want to eat, is categorical and part of their general common sense: everyone knows grocery stores sell milk and don't sell asphalt, chinese restaurants serve fried rice and not hamburgers, etc. We acquired a mixture of instance- and class- level pairs (e.g. $\langle$\textit{Ajay Mittal Dairy}, milk$\rangle$, $\langle$GroceryStore, milk$\rangle$, resp.) from a novel 3-tier crowdsourcing method, and demonstrate the scalability advantages of the class-level approach. Our results show that crowdsourced class-level knowledge can provide rapid scaling of knowledge acquisition in shopping and dining domains. The acquired common sense knowledge also has long-term value in the KG. The approach was a critical part of enabling a worldwide \textit{local search} capability on Google Maps, with which users can find products and dishes that are available in most places on earth.
In this paper, we propose a simple yet novel approach to acquiring \emph{class-level attributes} from the crowd that represent broad common sense associations between categories, and can be used with the classic knowledge-base default \& override technique (e.g. \cite{reiter1978}) to address the early \textit{label sparsity problem} faced by machine learning systems for problems that lack data for training. We demonstrate the effectiveness of our acquisition and reasoning approach on a pair of very real industrial-scale problems: how to augment an existing KG of places and offerings (e.g. stores and products, restaurants and dishes) with associations between them indicating the availability of the offerings at those places, which would enable the KG to provide answers to questions like, ``Where can I buy milk nearby?'' This problem has several practical challenges but for this paper we focus mostly on the label sparsity. Less than 30\% of physical places worldwide (i.e. brick \& mortar stores and restaurants) have a website, and less than half of those list their product catalog or menus, leaving a large acquisition gap to be filled by methods other than information extraction (IE). Label sparsity is a general problem, and not specific to these use cases, that prevents modern AI and machine learning techniques from applying to many applications for which labeled data is not readily available. As a result, the study of how to acquire the knowledge and data needed for AI to work is as much a problem today as it was in the 1970s and 80s during the advent of expert systems \cite{mycin1975}.
The class-level attributes approach presented here is based on a KG-inspired intuition that a lot of the knowledge people need to understand where to go to buy a product they need, or where to find the dishes they want to eat, is categorical and part of their general common sense: everyone knows grocery stores sell milk and don't sell asphalt, chinese restaurants serve fried rice and not hamburgers, etc. We acquired a mixture of instance- and class- level pairs (e.g. $\langle$\textit{Ajay Mittal Dairy}, milk$\rangle$, $\langle$GroceryStore, milk$\rangle$, resp.) from a novel 3-tier crowdsourcing method, and demonstrate the scalability advantages of the class-level approach. Our results show that crowdsourced class-level knowledge can provide rapid scaling of knowledge acquisition in shopping and dining domains. The acquired common sense knowledge also has long-term value in the KG. The approach was a critical part of enabling a worldwide \textit{local search} capability on Google Maps, with which users can find products and dishes that are available in most places on earth.