Training Machine Learning Models With Causal Logic

Ang Li
Suming Jeremiah Chen
Jingzheng Qin
IID Workshop at the 2020 World Wide Web Conference(2020)


Machine-learning (ML) models are ubiquitously used to make a variety of inferences, a common application being to predict and categorize user behavior. However, ML models often suffer from only being exposed to biased data -- for instance, a search ranking model that uses clicks to determine how to rank will suffer from position bias. The difficulty arises due to user feedback only being observed for one treatment and not existing counterfactually for other potential treatments. In this work, we discuss a real-world situation in which a binary classification model is used in production in order to make decisions about how to treat users. We introduce the model as well as the limitations of our modeling approach, and show that by using counterfactual selection criterion we can improve upon the current modeling process and do a better job classifying users. Following, we propose a causal modeling method in which we can take the existing data and use it to derive bounds that can be used for objective function modification in order to incorporate counterfactual learning into our training process.