A Flexible Probabilistic Framework for Large-Margin Mixture of Experts

Archit Sharma
Siddhartha Saxena
Piyush Rai
Machine Learning (2019)

Abstract

Mixture-of-Experts (MoE) enable learning highly nonlinear models by combining simple “expert” models. Each expert handles a small region of the data space, as dictated by a gating network which generates the (soft) assignment of each input to its corresponding expert(s). Despite their flexibility and renewed interest lately, existing MoE constructions pose severe difficulties during model training. Crucially, neither of the two popular gating networks used in MoE, namely the softmax gating network and hierarchical gating network (the latter used for hierarchical mixture of experts), are amenable to closed form parameter updates. The problem is further exacerbated if the experts do not have a conjugate likelihood and/or lack a naturally probabilistic formulation (e.g., logistic regression or large-margin classifiers such as SVM). To address these issues, we develop a novel probabilistic framework for MoE, leveraging Bayesian linear support vector machines as experts and variable augmentation schemes to facilitate inference. Our constructions lead to MoE models with attractive large-margin properties (for both flat and hierarchical MoE) while enjoying a very simple inference procedure with closed-form updates. Our models outperform traditional flat/hierarchical MoE models, as well as nonlinear models like Kernel SVMs and Gaussian Processes, on several benchmark datasets.

Research Areas