Learning to Bid in Contextual First Price Auctions
Abstract
In this paper, we investigate the problem about how to bid in repeated contextual first price
auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions:
at each time t, the learner observes a context xt ∈ R
d and decides the bid based on historical
information and xt. We assume a structured linear model of the maximum bid of all the others
mt = α0 · xt + zt, where α0 ∈ R
d
is unknown to the learner and zt is randomly sampled from a
noise distribution F with log-concave density function f. We consider both binary feedback (the
learner can only observe whether she wins or not) and full information feedback (the learner can
observe mt) at the end of each time t. For binary feedback, when the noise distribution F is
known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method
to achieve at most Oe(
p
log(d)T ) regret. Moreover, we generalize this algorithm to the setting
with binary feedback and the noise distribution is unknown but belongs to a parametrized family
of distributions. For the full information feedback with unknown noise distribution, we provide
an algorithm that achieves regret at most Oe(
√
dT). Our approach combines an estimator for
log-concave density functions and then MLE method to learn the noise distribution F and linear
weight α0 simultaneously. We also provide a lower bound result such that any bidding policy
in a broad class must achieve regret at least Ω(√
T), even when the learner receives the full
information feedback and F is known.
auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions:
at each time t, the learner observes a context xt ∈ R
d and decides the bid based on historical
information and xt. We assume a structured linear model of the maximum bid of all the others
mt = α0 · xt + zt, where α0 ∈ R
d
is unknown to the learner and zt is randomly sampled from a
noise distribution F with log-concave density function f. We consider both binary feedback (the
learner can only observe whether she wins or not) and full information feedback (the learner can
observe mt) at the end of each time t. For binary feedback, when the noise distribution F is
known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method
to achieve at most Oe(
p
log(d)T ) regret. Moreover, we generalize this algorithm to the setting
with binary feedback and the noise distribution is unknown but belongs to a parametrized family
of distributions. For the full information feedback with unknown noise distribution, we provide
an algorithm that achieves regret at most Oe(
√
dT). Our approach combines an estimator for
log-concave density functions and then MLE method to learn the noise distribution F and linear
weight α0 simultaneously. We also provide a lower bound result such that any bidding policy
in a broad class must achieve regret at least Ω(√
T), even when the learner receives the full
information feedback and F is known.