Parameter Tuning in Personal Search Systems

Suming Jeremiah Chen
13th ACM International Conference on Web Search and Data Mining (WSDM)(2020)


Retrieval effectiveness in information retrieval systems is heavily dependent on how various parameters are tuned. One option to find these parameters is to run multiple online experiments and using a parameter sweep approach in order to optimize the search system. There are multiple downsides of this approach, mainly that it may lead to a poor experience for users. Another option is to do offline evaluation, which can act as a safeguard against potential quality issues. Offline evaluation requires a validation set of data that can be benchmarked against different parameter settings. However, for search over personal corpora, e.g. email and file search, it is impractical and often impossible to get a complete representative validation set, due to the inability to save raw queries and document information. In this work, we show how to do offline parameter tuning with only a partial validation set. In addition, we demonstrate how to do parameter tuning in the cases when we have complete knowledge of the internal implementation of the search system (white-box tuning), as well as the case where we have only partial knowledge (grey-box tuning). This has allowed us to do offline parameter tuning in a privacy-sensitive manner.