Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback

Anirban Santara

Gaurav Aggarwal

Shuai Li

Claudio Gentile

(2022) (to appear)

Download Google Scholar

Abstract

Motivated by problems of ranking with partial information, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to standard cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms against a representative sample of cascading bandit baselines on a number of real-world datasets and show significantly improved empirical performance.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities