- Sreenivas Gollapudi
- Kostas Kollias
- Chinmay Maheshwari
- Manxi Wu

## Abstract

We develop an online learning algorithm for a navigation platform to route travelers in a congested network with multiple origin-destination (o-d) pairs while simultaneously learning the unknown cost function of road segments (edges) using the crowd-sourced data. The number of travel requests is randomly realized, and the travel time of each edge is stochastically distributed with the mean being a linear function that increases with the edge load (the number of travelers who take the edge). In each step of our algorithm, the platform updates the estimates of cost function parameters using the collected travel time data, and maintains a rectangular confidence interval of each parameter. The platform then routes travelers in the next step using an optimistic strategy based on the lower bound of the up-to-date confidence interval. The key aspect of our setting is that the number of collected data on each edge depends on the number of travelers who use it, which creates the dependency between the spatial distribution of data points and the routing strategy. We prove that the regret upper bound of our algorithm is (O(\sqrt{T}\log(T),|E|)), where (T) is the time horizon, and (|E|) is the number of edges in the network. Furthermore, we show that the regret bound decreases as the number of travelers increases, which implies that the platform learns faster with a larger user population. Finally, we implement our algorithm on the network of New York City, and demonstrate the efficacy using the accumulated regret.

## Research Areas

### Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work