Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals
Abstract
Machine learning models often need to satisfy many real-world policy goals and capture some
kinds of side information. We show that many such goals can be mathematically expressed as
constraints on the model’s predictions on the data, which we call rate constraints. In this paper, we
study the specific problem of training non-convex models subject to these rate constraints (which
are non-convex and non-differentiable), and the general problem of constrained optimization of
possibly non-convex objectives with possibly non-convex and non-differentiable constraints. In the
non-convex setting, the Lagrangian may not have an equilibrium to converge to, and thus using the
standard approach of Lagrange multipliers may fail as a deterministic solution may not even exist.
Furthermore, if the constraints are non-differentiable, then one cannot optimize the Lagrangian
with gradient-based methods by definition. To solve these issues, we present the proxy-Lagrangian,
which leads to an algorithm that produces a stochastic classifier with theoretical guarantees by
playing a two-player non-zero-sum game. The first player minimizes external regret in terms of a
differentiable relaxation of the constraints and the second player enforces the original constraints
by minimizing swap-regret: this leads to finding a solution concept which we call a semi-coarse
correlated equilibrium which we interestingly show corresponds to an approximately optimal and
feasible solution to the constrained optimization problem. We then give a procedure which shrinks
the randomized solution down to one that is a mixture of at most m + 1 deterministic solutions.
This culminates into end-to-end procedures which can provably solve non-convex constrained
optimization problems with possibly non-differentiable and non-convex constraints. We provide
extensive experimental results on benchmark and real-world problems, enforcing a broad range of
policy goals including different fairness metrics, and other goals on accuracy, coverage, recall, and
churn.
kinds of side information. We show that many such goals can be mathematically expressed as
constraints on the model’s predictions on the data, which we call rate constraints. In this paper, we
study the specific problem of training non-convex models subject to these rate constraints (which
are non-convex and non-differentiable), and the general problem of constrained optimization of
possibly non-convex objectives with possibly non-convex and non-differentiable constraints. In the
non-convex setting, the Lagrangian may not have an equilibrium to converge to, and thus using the
standard approach of Lagrange multipliers may fail as a deterministic solution may not even exist.
Furthermore, if the constraints are non-differentiable, then one cannot optimize the Lagrangian
with gradient-based methods by definition. To solve these issues, we present the proxy-Lagrangian,
which leads to an algorithm that produces a stochastic classifier with theoretical guarantees by
playing a two-player non-zero-sum game. The first player minimizes external regret in terms of a
differentiable relaxation of the constraints and the second player enforces the original constraints
by minimizing swap-regret: this leads to finding a solution concept which we call a semi-coarse
correlated equilibrium which we interestingly show corresponds to an approximately optimal and
feasible solution to the constrained optimization problem. We then give a procedure which shrinks
the randomized solution down to one that is a mixture of at most m + 1 deterministic solutions.
This culminates into end-to-end procedures which can provably solve non-convex constrained
optimization problems with possibly non-differentiable and non-convex constraints. We provide
extensive experimental results on benchmark and real-world problems, enforcing a broad range of
policy goals including different fairness metrics, and other goals on accuracy, coverage, recall, and
churn.