Jump to Content
Zhe Feng

Zhe Feng

I am a Research Scientist in the Market Algorithms group at Google Research, Mountain View. I got my PhD in Computer Science at Harvard University in 2021, where I was fortunately advised by David C. Parkes. My research broadly lies in the intersection between economics and computer science, and mainly focus on understanding how to design better mechanisms through machine learning approaches. See my personal website for more information.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We study the design of loss functions for click-through rates (CTR) to optimize (social) welfare in ad auctions. Existing works either only focus on CTR predictions without consideration of business objectives (e.g. welfare) in auctions or assume that the distribution over the participants' expected cost-per-impression (eCPM) is known a priori and uses various additional assumptions on the parametric form of the distribution to derive loss functions for predicting CTRs. In this work, we bring back the welfare objectives of ad auctions into CTR predictions and propose a novel weighted rankloss to train CTR model. Compared with the previous literature, our approach provides a provable guarantee on welfare but without any assumptions on the eCPMs' distribution, while also avoiding the intractability of naively applying existing learning-to-rank methods. We further propose a theoretically justifiable technique for calibrating the losses using labels generated from a teacher network, only assuming that the teacher network has bounded $\ell_2$ generalization error. Finally, we demonstrate the practical advantages of the proposed loss on both synthetic and real-world data. View details
    Preview abstract In the realm of contextual bandit algorithms, the standard framework involves observing a context, selecting an arm, and then observing the reward. This approach, while functional, often falls short when dealing with the complexity of real-world scenarios where additional valuable contexts are revealed after arm-selection. We introduce a new algorithm, pLinUCB, designed to incorporate these post-serving contexts effectively, thereby achieving sublinear regret. Our key technical contribution is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which allows us to handle the noise in the context vectors, a crucial aspect in a practical setting. This generalized EPL is of independent interest as it has potential applications beyond the scope of this work. Through extensive empirical tests on both synthetic and real-world datasets, we demonstrate that our proposed algorithm outperforms the state-of-the-art, thus establishing its practical potential. Our work underscores the importance of post-serving contexts in the contextual bandit setting and lays the groundwork for further research in this field. View details
    Online Bidding Algorithms for Return-on-Spend Constrained Advertisers
    Swati Padmanabhan
    The Proceedings of the ACM Web Conference 2023
    Preview abstract Online advertising has recently become a highly competitive, complex, multi-billion-dollar industry with advertisers bidding for ad slots at large scales and high frequencies. This has resulted in the growing need for efficient ``auto-bidding'' algorithms to determine the bids for incoming queries to maximize advertisers' targets subject to their specified constraints. Our work focuses on designing efficient online algorithms for a single value-maximizing advertiser under a commonly seen constraint: Return-on-Spend (RoS). We quantify the efficiency in terms of \emph{regret} in comparison with the optimal algorithm that knows all the queries a priori. Our main contribution is an algorithm that achieves near-optimal regret while respecting the specified RoS constraint. We are able to also incorporate our results with the pre-existing work of Balseiro, Lu, and Mirrokni~\cite{BLM20} to achieve near-optimal regret while respecting \emph{both} RoS and fixed budget constraints. Our main technical insight is in leveraging ideas from the literature on (offline) packing linear programs, in addition to the generic structural properties arising from our auction model. View details
    Learning to Bid in Contextual First Price Auctions
    Guru Prashanth Guruganesh
    The Proceedings of the ACM Web Conference 2023
    Preview abstract In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time t, the learner observes a context xt ∈ R d and decides the bid based on historical information and xt. We assume a structured linear model of the maximum bid of all the others mt = α0 · xt + zt, where α0 ∈ R d is unknown to the learner and zt is randomly sampled from a noise distribution F with log-concave density function f. We consider both binary feedback (the learner can only observe whether she wins or not) and full information feedback (the learner can observe mt) at the end of each time t. For binary feedback, when the noise distribution F is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most Oe( p log(d)T ) regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with unknown noise distribution, we provide an algorithm that achieves regret at most Oe( √ dT). Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution F and linear weight α0 simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least Ω(√ T), even when the learner receives the full information feedback and F is known. View details
    Preview abstract In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers’ strategic behaviours. First, we assume that the advertiser is completely myopic; i.e. in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight O(√T) regret in the worst-case and negative regret when the values are static across all the auctions. Next, we assume that the advertiser is non-myopic but cares about their γ-discounted long term utility. This setting is much more complex since an advertiser is incentived to “game” the mechanism by bidding strategically in earlier rounds. Advertisers may place strategic bids in earlier rounds to influence the algorithm in later rounds. In this setting, we develop an online mechanism that allows us to bound how often the advertiser misreports. Further, this mechanism achieves O(√T) regret against the optimal mechanism with truthful bids, when γ < 1. This complements the prior work that shows that Ω(T^{2/3}) regret is necessary when γ = 1, i.e. when the advertisers do not discount their utility. Furthermore, we provide an algorithm to achieve constant regret for the static valuation setting, even when γ = 1. View details
    A Context Integrated Transformer-based Neural Network for Auction Design
    Zhijian Duan
    Jingwu Tang
    Yutong Yin
    Xiang Yan
    Xiaotie Deng
    The Thirty-ninth International Conference on Machine Learning (ICML'22) (2022)
    Preview abstract One of the central problems in auction design is to develop an incentive compatible mechanism that maximizes the expected revenue. While theoretical approaches have encountered bottlenecks for multi-item auctions, recently there are many progresses of finding optimal auction through deep learning. However, such works either focus on a fixed set of bidders and items, or restrict the auction to be symmetric. In this work, we overcome this limitation by factoring \emph{public} contextual information of bidders and items into deep learning framework. We propose $\mathtt{CITransNet}$, a context integrated transformer-based neural network for contextual auction design, which maintains permutation-equivariance over bids while being able to handle asymmetric contextual information in auctions. We show by extensive experiments that $\mathtt{CITransNet}$ can recover the known optimal analytical solutions, obtain novel mechanisms for complex multi-item auctions, and generalize to settings different from training set. View details
    Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards
    Tianxi Li
    Haifeng Xu
    Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022) (2022)
    Preview abstract Incrementality, which is used to measure the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central problem for advertisers in advertising systems. In this paper, we investigate the problem about how the advertiser can decide the bid through learning conversion incrementality, in an online manner. We formulate this problem as an episodic Markov Decision Process (MDP) and propose a novel reinforcement learning (RL) algorithm that can achieve regret at most $O(H^2\sqrt{T})$, which doesn't rely on the number of actions (bids), where $H$ is the number of rounds in each episode and $T$ is the total number of episodes in MDP. In sharp contrast with standard RL framework, conversion incrementality feedback are delayed and mixed. To handle this difficulty we propose a novel pairwise moment-matching algorithm to learn conversion incrementality, which is of independent of interest. View details
    Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning
    Jibang Wu
    Zixuan Zhang
    Zhaoran Wang
    Zhuoran Yang
    Michael I. Jordan
    Haifeng Xu
    The Twenty-Third ACM Conference on Economics and Computation (EC'22) (2022)
    Preview abstract In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align its long term interest with incentives of the gig service providers (e.g., drivers, hosts). This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), in which a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximize the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions. Planning in MPPs thus faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. Nevertheless, in the population level where the model is known, it turns out that we can efficiently determine the optimal (resp. $\epsilon$-optimal) policy with finite (resp. infinite) states and outcomes, through a modified formulation of the Bellman equation that additionally takes persuasiveness into consideration. Our main technical contribution is to study the MPP under the online reinforcement learning (RL) setting, where the goal is to learn the optimal signaling policy by interacting with with the underlying MPP, without the knowledge of the sender's utility functions, prior distributions, and the Markov transition kernels. For such a problem, we design a provably efficient no-regret learning algorithm, the \fullmodel{} (\ORAI), which features a novel combination of both optimism and pessimism principles. In particular, we obtain optimistic estimates of the value functions to encourage exploration under the unknown environment, and additionally robustify the signaling policy with respect to the uncertainty of prior estimation to prevent receiver's detrimental equilibrium behavior. Our algorithm enjoys sample efficiency by achieving a sublinear $\sqrt{T}$-regret upper bound. Furthermore, both our algorithm and theory can be applied to MPPs with large space of outcomes and states via function approximation, and we showcase such a success under the linear setting. View details
    No Results Found