Bernhard Friedrich Brodowsky
Bernhard Brodowsky is a Software Engineer in Google Shopping. He earned a Master's degree from ETH Zurich.
Research Areas
Authored Publications
Sort By
Adversarial Bandits Policy for Crawling Commercial Web Content
Shuguang Han
Przemek Gajda
Sergey Novikov
Alexandrin Popescul
Proceedings of the Web Conference 2020 (WWW 2020), pp. 407-417
Preview abstract
The rapid growth of commercial web content has driven the development of shopping search services to help users find product offers. Due to the dynamic nature of commercial content, an effective recrawl policy is a key component in a shopping search service; it ensures that users have access to the up-to-date product details. Most of the existing strategies either relied on simple heuristics, or overlooked the resource budgets. To address this, Azar et al. [5] recently proposed an optimization strategy LambdaCrawl aiming to maximize content freshness within a given resource budget. In this paper, we demonstrate that the effectiveness of LambdaCrawl is governed in large part by how well future content change rate can be estimated. By adopting the state-of-the-art deep learning models for change rate prediction, we obtain a substantial increase of content freshness over the common LambdaCrawl implementation with change rate estimated from the past history. Moreover, we demonstrate that while LambdaCrawl is a significant advancement upon existing recrawl strategies, it can be further improved upon by a unified multi-strategy recrawl policy. To this end, we adopt the $K$-armed adversarial bandits algorithm that can provably optimize the overall freshness by combining multiple strategies. Empirical results over a large-scale production dataset confirm its superiority to LambdaCrawl, especially under tight resource budgets.
View details
Predictive Crawling for Commercial Web Content
Shuguang Han
Przemek Gajda
Sergey Novikov
Robin Dua
Alexandrin Popescul
Proceedings of the 2019 World Wide Web Conference, pp. 627-637
Preview abstract
Web crawlers spend significant resources to maintain freshness of their crawled data. This paper describes the optimization of resources to ensure that product prices shown in ads in a context of a shopping sponsored search service are synchronized with current merchant prices. We are able to use the predictability of price changes to build a machine learned system leading to considerable resource savings for both the merchants and the crawler. We describe our solution to technical challenges due to partial observability of price history, feedback loops arising from applying machine learned models, and offers in cold start state. Empirical evaluation over large-scale product crawl data demonstrates the effectiveness of our model and confirms its robustness towards unseen data. We argue that our approach can be applicable in more general data pull settings.
View details