Improving Cloud Storage Search with User Activity

Proceedings of the 14th International Conference on Web Search and Data Mining (WSDM '21), ACM (2021) (to appear)


Cloud-based file storage platforms such as Google Drive are widely used as a means for storing, editing and sharing personal and organizational documents. In this paper, we improve search ranking quality for cloud storage platforms by utilizing user activity logs. Different from search logs, activity logs capture general document usage activity beyond search, such as opening, editing and sharing documents. We propose to automatically learn text embeddings that are effective for search ranking from activity logs. We develop a novel co-access signal, i.e., whether two documents were accessed by a user around the same time, to train deep semantic matching models that are useful for improving the search ranking quality. We confirm that activity-trained semantic matching models can improve ranking by conducting extensive offline experimentation using Google Drive search and activity logs. To the best of our knowledge, this is the first work to examine the benefits of leveraging document usage activity at large scale for cloud storage search; as such it can shed light on using such activity in scenarios where direct collection of search-specific interactions (e.g., query and click logs) may be expensive or infeasible.

