Revisiting two tower models for unbiased learning to rank
Two-tower architecture (one tower to factorize out position-related bias) has now become a common technique in neural network ranking models for Unbiased Learning To Rank (ULTR). In these models, a neural network tower taking in all position related features is designed to model the biases, which are equivalent to the propensity scores used to define the unbiased ranking metrics. It works based on the assumptions that the user interaction (click) is conditioned on the user observation of a ranked item, and only the observation probability depends on the position. So if we factorize out the observation probability, we can then unbiased rank the items by their click rate conditioned on observation. The assumption appears sensible, and the additive two-tower models based on it have been widely implemented in ULTR. However, two-tower models may not always work and sometimes work even worse than the biased models, as the user may not always follow the same pattern. In this work, we stick to the plausible assumption about the user interaction, but we also consider the spectrum of different user behaviors. In this case, the assumption that the position related observation probability may not be able to get explicitly factorized out. We also study generic methods to treat this complexity and show these methods could outperform the simple additive debias models in offline experiments.