 
                Bernhard Friedrich Brodowsky
            Bernhard Brodowsky is a Software Engineer in Google Shopping. He earned a Master's degree from ETH Zurich.
          
        
        Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Adversarial Bandits Policy for Crawling Commercial Web Content
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Shuguang Han
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Michael Bendersky
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Przemek Gajda
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sergey Novikov
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alexandrin Popescul
                      
                    
                  
              
            
          
          
          
          
            Proceedings of the Web Conference 2020 (WWW 2020), pp. 407-417
          
          
        
        
        
          
              Preview abstract
          
          
              The rapid growth of commercial web content has driven the development of shopping search services to help users find product offers. Due to the dynamic nature of commercial content, an effective recrawl policy is a key component in a shopping search service; it ensures that users have access to the up-to-date product details. Most of the existing strategies either relied on simple heuristics, or overlooked the resource budgets. To address this, Azar et al. [5] recently proposed an optimization strategy LambdaCrawl aiming to maximize content freshness within a given resource budget. In this paper, we demonstrate that the effectiveness of LambdaCrawl is governed in large part by how well future content change rate can be estimated. By adopting the state-of-the-art deep learning models for change rate prediction, we obtain a substantial increase of content freshness over the common LambdaCrawl implementation with change rate estimated from the past history. Moreover, we demonstrate that while LambdaCrawl is a significant advancement upon existing recrawl strategies, it can be further improved upon by a unified multi-strategy recrawl policy. To this end, we adopt the $K$-armed adversarial bandits algorithm that can provably optimize the overall freshness by combining multiple strategies. Empirical results over a large-scale production dataset confirm its superiority to LambdaCrawl, especially under tight resource budgets.
              
  
View details
          
        
      
    
        
          
            
              Predictive Crawling for Commercial Web Content
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Shuguang Han
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Przemek Gajda
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sergey Novikov
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mike Bendersky
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Robin Dua
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alexandrin Popescul
                      
                    
                  
              
            
          
          
          
          
            Proceedings of the 2019 World Wide Web Conference, pp. 627-637
          
          
        
        
        
          
              Preview abstract
          
          
              Web crawlers spend significant resources to maintain freshness of their crawled data. This paper describes the optimization of resources to ensure that product prices shown in ads in a context of a shopping sponsored search service are synchronized with current merchant prices. We are able to use the predictability of price changes to build a machine learned system leading to considerable resource savings for both the merchants and the crawler.  We describe our solution to technical challenges due to partial observability of price history, feedback loops arising from applying machine learned models, and offers in cold start state. Empirical evaluation over large-scale product crawl data demonstrates the effectiveness of our model and confirms its robustness towards unseen data. We argue that our approach can be applicable in more general data pull settings.
              
  
View details
          
        
      
    