Jump to Content
Shumeet Baluja

Shumeet Baluja

Shumeet Baluja, Ph.D., is currently a Senior Staff Research Scientist at Google, where he works on a broad set of topics ranging from image processing and machine learning to wireless application development and user interaction measurement. He is the author of the forthcoming book The Silicon Jungle: A Novel of Deception, Power, and Internet Intrigue.

Shumeet was formerly the Chief Technology Officer (CTO) of JAMDAT Mobile, Inc., where he oversaw all aspects of technology initiation, development and deployment. Previously, Shumeet was Chief Scientist at Lycos Inc., where he led the Research and Applied Technology Group in the quantitative and qualitative analysis of user behavior, including data mining and trend analysis, advertisement optimization, and statistical modeling of traffic and site interactions. As Senior Vice President of R&D at eCompanies LLC, he spearheaded the creation of their wireless practice and was responsible for finding and evaluating new technologies and technology entrepreneurs.

Shumeet has filed numerous patents and has published scientific papers in fields including computer vision and facial image processing, advertisement display and optimization, automated vehicle control, statistical machine learning, and high-dimensional optimization. Shumeet graduated from the University of Virginia with a Bachelor of Science, with high distinction, in 1991. He received his Ph.D. in computer science from Carnegie Mellon University in 1996.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract Hand-drawn doodles present an interesting and difficult set of textures to model and synthesize. Unlike the typical natural images that are most often used in texture synthesis studies, the doodles examined here are characterized by the use of sharp, irregular, and imperfectly scribbled patterns, frequent imprecise strokes, haphazardly connected edges, and randomly or spatially shifting themes. The almost binary nature of the doodles examined makes it difficult to hide common mistakes such as discontinuities. Further, there is no color or shading to mask flaws and repetition; any process that relies on, even stochastic, region copying is readily discernible. To tackle the problem of synthesizing these textures, we model the underlying generation process of the doodle taking into account potential unseen, but related, expansion contexts. We demonstrate how to generate infinitely long textures, such that the texture can be extended far beyond a single image's source material. This is accomplished by creating a novel learning mechanism that is taught to condition the generation process on its own generated context -- what was generated in previous steps -- not just upon the original material. View details
    Visualizing Semantic Walks
    NeurIPS-2022 Workshop on Machine Learning for Creativity and Design, https://neuripscreativityworkshop.github.io/2022/
    Preview abstract An embedding space trained from both a large language model and vision model contains semantic aspects of both and provides connections between words, images, concepts, and styles. This paper visualizes characteristics and relationships in this semantic space. We traverse multi-step paths in a derived semantic graph to reveal hidden connections created from the immense amount of data used to create these models. We specifically examine these relationships in the domain of painters, their styles, and their subjects. Additionally, we present a novel, non-linear sampling technique to create informative visualization of semantic graph transitions. View details
    Adding Non-linear Context to Deep Networks
    Michele Covell
    IEEE International Conference on Image Processing (2022)
    Preview
    Not All Network Weights Need to Be Free
    Michele Covell
    21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022, Bahamas, December 12-14, 2022
    Preview abstract As state of the art network models routinely grow to architectures with billions and even trillions of learnable parameters, the need to efficiently store and retrieve these models into working memory becomes a more pronounced bottleneck. This is felt most severely in efforts to port models to personal devices, such as consumer cell phones, which now commonly include GPU and TPU processors designed to handle the enormous computational burdens associated with deep networks. In this paper, we present novel techniques for dramatically reducing the number of free parameters in deep network models with the explicit goals of (1) model compression with little or no model decompression overhead at inference time and (2) reducing the number of free parameters in arbitrary model without requiring any modifications to the architecture. We examine four techniques that build on each other, and provide insight into when and how each technique operates. Accuracy as a function of free parameters is measured on two very different deep networks: ResNet and Vision Transformer. On the latter, we find that we can reduce the number of parameters by 20\% with no loss in accuracy. View details
    Interpretable Actions: Controlling Experts with Understandable Commands
    Michele Covell
    The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), AAAI (2021)
    Preview abstract Despite the prevalence of deep neural networks, their single most cited drawback is that, even when successful, their operations are inscrutable. For many applications, the desired outputs are the composition of externally-defined bases. For such decomposable domains, we present a two-stage learning procedure producing combinations of the external bases which are trivially extractable from the network. In the first stage, the set of external bases that will form the solution are modeled as differentiable generator modules, controlled by the same parameters as the external bases. In the second stage, a controller network is created that selects parameters for those generators, either successively or in parallel, to compose the final solution. Through three tasks, we concretely demonstrate how our system yields readily understandable commands. In one, we introduce a new form of artistic style transfer, learning to draw and color with crayons, in which the transformation of a photograph or painting occurs not as a single monolithic computation, but by the composition of thousands of individual, visualizable strokes. The other two tasks, single-pass function approximation with arbitrary bases and shape-based synthesis, show how our approach produces understandable and extractable actions in two disparate domains. View details
    Preview abstract Numerous methods have been proposed to transform color and grayscale images to their single bit-per-pixel binary counterparts. Commonly, the goal is to enhance specific attributes of the original image to make it more amenable for analysis. However, when the resulting binarized image is intended for human viewing, aesthetics must also be considered. Binarization techniques, such as half-toning, stippling, and hatching, have been widely used for modeling the original image's intensity profile. We present an automated method to transform an image to a set of binary textures that represent not only the intensities, but also the colors of the original. The foundation of our method is information preservation: creating a set of textures that allows for the reconstruction of the original image's colors solely from the binarized representation. We present techniques to ensure that the textures created are not visually distracting, preserve the intensity profile of the images, and are natural in that they map sets of colors that are perceptually similar to patterns that are similar. The approach uses deep-neural networks and is entirely self-supervised; no examples of good vs. bad binarizations are required. The system yields aesthetically pleasing binary images when tested on a variety of image sources. View details
    Contextual Convolution Blocks
    Proceedings of the British Machine Vision Conference 2021 (2021)
    Preview abstract A fundamental processing layer of modern deep neural networks is the 2D convolution. It applies a filter uniformly across the input, effectively creating feature detectors that are translation invariant. In contrast, fully-connected layers are spatially selective, allowing unique detectors across the input. However, full connectivity comes at the expense of an enormous number of free parameters to be trained, the associated difficulty in learning without over-fitting, and the loss of spatial coherence. We introduce Contextual Convolution Blocks, a novel method to create spatially selective feature detectors that are locally translation invariant. This increases the expressive power of the network beyond standard convolutional layers and allows learning unique filters for distinct regions of the input. The filters no longer need to be discriminative in regions not likely to contain the target features. This is a generalization of the Squeeze-and-Excitation architecture that introduces minimal extra parameters. We provide experimental results on three datasets and a thorough exploration into how the increased expressiveness is instantiated. View details
    Immediate Gestalt: Shapes, Typography and (Quite Irregular) Packing
    Journal of Mathematics and the Arts, vol. to Appear (2020), pp. 1-22
    Preview abstract Instantaneously understanding the gestalt of thousands of words is achieved through the programmatic placement of the words and control of their presentation characteristics, such as size, repetition, and font. As early as the fourteenth century, words were used as building blocks for images. Hundreds of years later, this typographic experiment continues with the addition of raw computational power. The ability to place thousands of words in interesting forms gives rise to a quantitatively different form of expression. The resulting procedures are expressive enough to represent shapes, textures, and shading automatically. Though based on approaches for addressing the classic problem of algorithmic two-dimensional bin-packing, aesthetically pleasing results are achieved through the incorporation of a small set of rules to guide the layout. View details
    Seamless Audio Melding: Using Seam Carving with Music Playlists
    Michele Covell
    Proceedings of 2020 International Conference on Advances in Multimedia, IARIA, pp. 24-29
    Preview abstract In both studio and live performances, professional music DJs in an increasing number of popular musical genres mix recordings together into continuous streams that progress seamlessly from one song to the next. When done well, these create an engaging and seamless experience, as if they were part of a single performance. This work introduces a new way to provide that continuity using not only beat matching, but also frequency-dependent cross fades. The basis of our technique is derived from the well developed technique of visual-seam carving, most commonly found in computer vision and graphics systems. We adapt visual seam carving to indicate the times at which to transition specific frequencies from one song to the next. Additionally, we also describe a way to invert the melded spectrogram with minimal computation. The entire system works faster than real-time to provide the ability to use this system in live performances. View details
    Hiding Images Within Images
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. Early Access (2019) (to appear)
    Preview abstract We present a system to hide a full color image inside another of the same size with minimal quality loss to either image. Deep neural networks are simultaneously trained to create the hiding and revealing processes and are designed to specifically work as a pair. The system is trained on images drawn randomly from the ImageNet database, and works well on natural images from a wide variety of sources. Beyond demonstrating the successful application of deep learning to hiding images, we examine how the result is achieved and apply numerous transformations to analyze if image quality in the host and hidden image can be maintained. These transformation range from simple image manipulations to sophisticated machine learning-based adversaries. Two extensions to the basic system are presented that mitigate the possibility of discovering the content of the hidden image. With these extensions, not only can the hidden information be kept secure, but the system can be used to hide even more than a single image. Applications for this technology include image authentication, digital watermarks, finding exact regions of image manipulation, and storing meta-information about image rendering and content. View details
    Preview abstract A rapidly increasing portion of Internet traffic is dominated by requests from mobile devices with limited- and metered-bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and extremely compressed image previews as part of the initial page-load process. Recent work, based on an adaptive triangulation of the target image, has shown the ability to generate thumbnails of full images at extreme compression rates: 200 bytes or less with impressive gains (in terms of PSNR and SSIM) over both JPEG and WebP standards. However, qualitative assessments and preservation of semantic content can be less favorable. We present a novel method to significantly improve the reconstruction quality of the original image with no changes to the encoded information. Our neural-based decoding not only achieves higher PSNR and SSIM scores than the original methods, but also yields a substantial increase in semantic-level content preservation. In addition, by keeping the same encoding stream, our solution is completely inter-operable with the original decoder. The end result is suitable for a range of small-device deployments, as it involves only a single forward-pass through a small, scalable network. View details
    Continuous Selection of Optimized Traffic Light Schedules: A Machine Learning Approach
    The 21st IEEE International Conference on Intelligent Transportation Systems, IEEE (2018)
    Preview abstract Machine learning-based optimization of traffic light programs has been successfully employed to reduce emissions and traffic delays. Due to the variability of traffic flows, it is common practice to optimize multiple traffic light programs tailored for specific conditions and deploy them at predetermined times of the day or days of the week. We explore an alternative to this manual set-interval methodology. We create a system to automatically select the appropriate light controller program in response to continuously changing conditions. We analyze the current traffic density and close-time traffic patterns and instantiate the correct pre-optimized light program based on current conditions. Rather than creating a small set of programs tailored for specific periods of the day, we automatically create, and select from, an over-complete set of light controllers. Based on historic observations, a combination of machine learning approaches are used to find the best representative set of traffic flows to model the system. From these, multiple traffic-light controllers are created to address each flow individually. Using the automated matching system, we achieved reductions in both emissions and travel time over previously optimized lights. We examine the robustness of the system by ensuring that the system operates under large amounts of variability in traffic. View details
    Representing Images in 200 Bytes: Compression via Triangulation
    Pascal Massimino
    Michele Covell
    Proceedings of 2018 International Conference on Image Processing, IEEE
    Preview abstract A rapidly increasing portion of internet traffic is dominated by requests from mobile devices with limited and metered bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and extremely compressed image previews as part of the initial page load process to improve responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore an active research direction. In this work, we concentrate on extreme compression rates, where the size of the image is typically 200 bytes or less. First, we propose a novel approach for image compression that, unlike commonly used methods, does not rely on block-based statistics. We use an approach based on an adaptive triangulation of the target image, devoting more triangles to high entropy regions of the image. Second, we present a novel algorithm for encoding the triangles. The results show favorable statistics, in terms of PSNR and SSIM, over both the JPEG and the WebP standards. View details
    Preview abstract With the rapidly increasing popularity of deep neural networks for image recognition tasks, a parallel interest in generating adversarial examples to attack the trained models has arisen. To date, these approaches have involved either directly computing gradients with respect to the image pixels or directly solving an optimization on the image pixels. We generalize this pursuit in a novel direction: can a separate network be trained to efficiently attack another fully trained network? We demonstrate that it is possible, and that the generated attacks yield startling insights into the weaknesses of the target network. We call such a network an Adversarial Transformation Network (ATN). ATNs transform any input into an adversarial attack on the target network, while being minimally perturbing to the original inputs and the target network’s outputs. Further, we show that ATNs are capable of not only causing the target network to make an error, but can be constructed to explicitly control the type of misclassification made. We demonstrate ATNs on both simple MNIST digit classifiers and state-of-the-art ImageNet classifiers deployed by Google, Inc.: Inception ResNet-v2. View details
    Learning Deep Models of Optimization Landscapes
    IEEE Symposium Series on Computational Intelligence, IEEE (2017)
    Preview abstract In all but the most trivial optimization problems, the structure of the solutions exhibit complex interdependencies between the input parameters. Decades of research with stochastic search techniques has shown the benefit of explicitly modeling the interactions between sets of parameters and the overall quality of the solutions discovered. We demonstrate a novel method, based on learning deep networks, to model the global landscapes of optimization problems. To represent the search space concisely and accurately, the deep networks must encode information about the underlying parameter interactions and their contributions to the quality of the solution. Once the networks are trained, the networks are probed to reveal parameter combinations with high expected performance with respect to the optimization task. These estimates are used to initialize fast, randomized, local-search algorithms, which in turn expose more information about the search space that is subsequently used to refine the models. We demonstrate the technique on multiple problems that have arisen in a variety of real-world domains, including: packing, graphics, job scheduling, layout and compression. Strengths, limitations, and extensions of the approach are extensively discussed and demonstrated. View details
    Learning typographic style: from discrimination to synthesis
    Machine Vision and Applications, vol. 28, Issues 5-6 (2017), pp. 551-568
    Preview abstract Typography is a ubiquitous art form that affects our understanding, perception and trust in what we read. Thousands of different font-faces have been created with enormous variations in the characters. In this paper, we learn the style of a font by analyzing a small subset of only four letters. From these four letters, we learn two tasks. The first is a discrimination task: given the four letters and a new candidate letter, does the new letter belong to the same font? Second, given the four basis letters, can we generate all of the other letters with the same characteristics as those in the basis set? We use deep neural networks to address both tasks, quantitatively and qualitatively measure the results in a variety of novel manners, and present a thorough investigation of the weaknesses and strengths of the approach. All of the experiments are conducted with publicly available font sets. View details
    Hiding Images in Plain Sight: Deep Steganography
    Neural Information Processing Systems, NIPS (2017)
    Preview abstract Steganography is the practice of concealing a secret message within another, ordinary, message. Commonly, steganography is used to unobtrusively hide a small message within the noisy regions of a larger image. In this study, we attempt to place a full size color image within another image of the same size. Deep neural networks are simultaneously trained to create the hiding and revealing processes and are designed to specifically work as a pair. The system is trained on images drawn randomly from the ImageNet database, and works well on natural images from a wide variety of sources. Beyond demonstrating the successful application of deep learning to hiding images, we carefully examine how the result is achieved and explore extensions. Unlike many popular steganographic methods that encode the secret message within the least significant bits of the carrier image, our approach compresses and distributes the secret image’s representation across all of the available bits. View details
    Preview abstract Real-time optimization of traffic flow addresses important practical problems: reducing a driver's wasted time, improving city-wide efficiency, reducing gas emissions and improving air quality. Much of the current research in traffic-light optimization relies on extending the capabilities of traffic lights to either communicate with each other or communicate with vehicles. However, before such capabilities become ubiquitous, opportunities exist to improve traffic lights by being more responsive to current traffic situations within the current, already deployed, infrastructure. In this paper, we introduce a traffic light controller that employs bidding within micro-auctions to efficiently incorporate traffic sensor information; no other outside sources of information are assumed. We train and test traffic light controllers on large-scale data collected from opted-in Android cell-phone users over a period of several months in Mountain View, California and the River North neighborhood of Chicago, Illinois. The learned auction-based controllers surpass (in both the relevant metrics of road-capacity and mean travel time) the currently deployed lights, optimized static-program lights, and longer-term planning approaches, in both cities, measured using real user driving data. View details
    Preview abstract In all but the most trivial optimization problems, the structure of the solutions exhibit complex interdependencies between the input parameters. Decades of research with stochastic search techniques has shown the benefit of explicitly modeling the interactions between sets of parameters and the overall quality of the solutions discovered. We demonstrate a novel method, based on learning deep networks, to model the global landscapes of optimization problems. To represent the search space concisely and accurately, the deep networks must encode information about the underlying parameter interactions and their contributions to the quality of the solution. Once the networks are trained, the networks are probed to reveal parameter combinations with high expected performance with respect to the optimization task. These estimates are used to initialize fast, randomized, local search algorithms, which in turn expose more information about the search space that is subsequently used to refine the models. We demonstrate the technique on multiple optimization problems that have arisen in a variety of real-world domains, including: packing, graphics, job scheduling, layout and compression. The problems include combinatoric search spaces, discontinuous and highly non-linear spaces, and span binary, higher-cardinality discrete, as well as continuous parameters. Strengths, limitations, and extensions of the approach are extensively discussed and demonstrated. View details
    Variable Rate Image Compression with Recurrent Neural Networks
    Sung Jin Hwang
    Damien Vincent
    Michele Covell
    International Conference on Learning Representations (2016)
    Preview abstract A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image dimensions and the desired compression rate; (2) our networks are progressive, meaning that the more bits are sent, the more accurate the image reconstruction; and (3) the proposed architecture is at least as efficient as a standard purpose-trained autoencoder for a given number of bits. On a large-scale benchmark of 32×32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more. View details
    Preview abstract Decades of research have been directed towards improving the timing of traffic lights. The ubiquity of cell phones among drivers has created the opportunity to design new sensors for traffic light controllers. These new sensors, which search for radio signals that are constantly emanating from cell phones, hold the hope of replacing the typical induction-loop sensors that are installed within road pavements. A replacement to induction sensors is desired as they require significant roadwork to install, frequent maintenance and checkups, are sensitive to proper repairs and installation work, and the construction techniques, materials, and even surrounding unrelated ground work can be sources of failure. However, before cell phone sensors can be widely deployed, users must become comfortable with the passive use of their cell phones by municipalities for this purpose. Despite complete anonymization, public privacy concerns may remain. This presents a chicken-and-egg problem: without showing the benefits of using cell phones for traffic monitoring, users may not be willing to allow this use. In this paper, we show that by carefully training the traffic light controllers, we can unlock the benefits of these sensors when only a small fraction of users allow their cell phones to be used. Surprisingly, even when there is only small percentage of opted-in users, the new traffic controllers provide large benefits to all drivers View details
    Preview abstract Real-time optimization of traffic flow addresses important practical problems: reducing a driver's wasted time, improving city-wide efficiency, reducing gas emissions, and improving air quality. Many current implementations and research studies that address traffic signal control construct a light controller's program (whether adaptive or static) by segmenting the day into divisions in which distinct traffic patterns are expected: rush hours, weekends, nights, etc. We consider the problem of automatically adapting a set of traffic lights to changing conditions based upon the distribution of observed traffic-density in surrounding areas. Unlike previous techniques which specify an a priori set number of unique flow patterns, we assume an over-complete set of traffic patterns. A combination of machine learning approaches are used to create a diverse set of traffic-light programs that can be instantiated when new traffic flow patterns are recognized. We have observed significant reduction in expected emissions and delays, while being agnostic to the number of underlying distinct patterns in the traffic. View details
    Preview abstract Feature selection is essential for effective visual recognition. We propose an efficient joint classifier learning and feature selection method that discovers sparse, compact representations of input features from a vast sea of candidates, with an almost unsupervised formulation. Our method requires only the following knowledge, which we call the feature sign—whether or not a particular feature has on average stronger values over positive samples than over negatives. We show how this can be estimated using as few as a single labeled training sample per class. Then, using these feature signs, we extend an initial supervised learning problem into an (almost) unsupervised clustering formulation that can incorporate new data without requiring ground truth labels. Our method works both as a feature selection mechanism and as a fully competitive classifier. It has important properties, low computational cost and excellent accuracy, especially in difficult cases of very limited training data. We experiment on large-scale recognition in video and show superior speed and performance to established feature selection approaches such as AdaBoost, Lasso, greedy forward-backward selection, and powerful classifiers such as SVM. View details
    Preview abstract Typography is a ubiquitous art form that affects our understanding, perception, and trust in what we read. Thousands of different font-faces have been created with enormous variations in the characters. In this paper, we learn the style of a font by analyzing a small subset of only four letters. From these four letters, we learn two tasks. The first is a discrimination task: given the four letters and a new candidate letter, does the new letter belong to the same font? Second, given the four basis letters, can we generate all of the other letters with the same characteristics as those in the basis set? We use deep neural networks to address both tasks, quantitatively and qualitatively measure the results in a variety of novel manners, and present a thorough investigation of the weaknesses and strengths of the approach. View details
    Preview abstract The phenomenal growth of the number of videos on YouTube provides enormous potential for users to find content of interest to them. Unfortunately, as the size of the repository grows, the task of discovering high-quality content becomes more daunting. To address this, YouTube occasionally asks users for feedback on videos. In one such event (the YouTube Comedy Slam), users were asked to rate which of two videos was funnier. This yielded sparse pairwise data indicating a participant’s relative assessment of two videos. Given this data, several questions immediately arise: how do we make inferences for uncompared pairs, overcome noisy, and usually contradictory, data, and how do we handle severely skewed, real-world, sampling? To address these questions, we introduce the concept of a domination-graph, and demonstrate a simple and scalable method, based on the Adsorption algorithm, to efficiently propagate preferences through the graph. Before tackling the publicly available YouTube data, we extensively test our approach on synthetic data by attempting to recover an underlying, known, rank-order of videos using similarly created sparse preference data. View details
    Micro-Auction-Based Traffic-Light Control: Responsive, Local Decision Making
    Michele Covell
    International Conference on Intelligent Transportation Systems (2015)
    Preview abstract Real-time, responsive optimization of traffic flow serves to address important practical problems: reducing drivers’ wasted time and improving city-wide efficiency, as well as reducing gas emissions and improving air quality. Much of the current research in traffic-light optimization relies on extending the capabilities of basic traffic lights to either communicate with each other or communicate with vehicles. However, before such capabilities become ubiquitous, opportunities exist to improve traffic lights by being more responsive to current traffic situations within the existing, deployed, infrastructure. In this paper, we use micro-auctions as the organizing principle with which to incorporate local induction loop information; no other outside sources of information are assumed. At every time step in which a phase change is permitted, each light conducts a decentralized, weighted, micro-auction to determine which phase to instantiate next. We test the lights on real-world data collected over a period of several weeks around the Mountain View, California area. In our simulations, the auction mechanisms based only on local sensor data surpass longer-term planning approaches that rely on widely placed sensors and communications. View details
    Preview abstract Decades of research have been directed towards improving the timing of existing traffic lights. In many parts of the world where this research has been conducted, detailed maps of the streets and the precise locations of the traffic lights are publicly available. Continued timing research has recently been further spurred by the increasing ubiquity of personal cell-phone based GPS systems. Through their use, an enormous amount of travel tracks have been amassed — thus providing an easy source of real traffic data. Nonetheless, one fundamental piece of information remains absent that limits the quantification of the benefits of new approaches: the existing traffic light schedules and traffic light response behaviors. Unfortunately, deployed traffic light schedules are often not known. Rarely are they kept in a central database, and even when they are, they are often not easily obtainable. The alternative, manual inspection of a system of multiple traffic lights may be prohibitively expensive and time-consuming for many experimenters. Without the existing light schedules, it is difficult to ascertain the real-improvements that new traffic light algorithms and approaches will have — especially on traffic patterns that have not yet been encountered in the collected data. To alleviate this problem, we present an approach to estimating existing traffic light schedules based on collected GPS-travel tracks. We present numerous ways to test the results and comprehensively demonstrate them on both synthetic and real data. One of the many uses, beyond studying the effects of existing lights in previously unencountered traffic flow environments, is to serve as a realistic baseline for light timing and schedule optimization studies. View details
    The Virtues of Peer Pressure: A Simple Method for Discovering High-Value Mistakes
    Michele Covell
    International Conference on Computer Analysis of Images and Patterns (2015)
    Preview abstract Much of the recent success of neural networks can be attributed to the deeper architectures that have become prevalent. However, the deeper architectures often yield unintelligible solutions, require enormous amounts of labeled data, and still remain brittle and easily broken. In this paper, we present a method to efficiently and intuitively discover input instances that are misclassified by well-trained neural networks. As in previous studies, we can identify instances that are so similar to previously seen examples such that the transformation is visually imperceptible. Additionally, unlike in previous studies, we can also generate mistakes that are significantly different from any training sample, while, importantly, still remaining in the space of samples that the network should be able to classify correctly. This is achieved by training a basket of N “peer networks” rather than a single network. These are similarly trained networks that serve to provide consistency pressure on each other. When an example is found for which a single network, S, disagrees with all of the other N − 1 networks, which are consistent in their prediction, that example is a potential mistake for S. We present a simple method to find such examples and demonstrate it on two visual tasks. The examples discovered yield realistic images that clearly illuminate the weaknesses of the trained models, as well as provide a source of numerous, diverse, labeled-training samples. View details
    Neighborhood Preserving Codes for Assigning Point Labels: Applications to Stochastic Search
    Michele Covell
    Procedia Computer Science: 2013 International Conference on Computational Science, Elsevier, pp. 956-965
    Preview abstract Selecting a good representation of a solution-space is vital to solving any search and optimization problem. In particular, once regions of high performance are found, having the property that small changes in the candidate solution correspond to searching nearby neighborhoods provides the ability to perform effective local optimization. To achieve this, it is common for stochastic search algorithms, such as stochastic hillclimbing, evolutionary algorithms (including genetic algorithms), and simulated annealing, to employ Gray Codes for encoding ordinal points or discretized real numbers. In this paper, we present a novel method to label similar and/or close points within arbitrary graphs with small Hamming distances. The resultant point labels can be seen as an approximate high-dimensional variant of Gray Codes with standard Gray Codes as a subset of the labels found here. The labeling procedure is applicable to any task in which the solution requires the search algorithm to select a small subset of items out of many. Such tasks include vertex selection in graphs, knapsack-constrained item selection, bin packing, prototype selection for machine learning, and numerous scheduling problems, to name a few. View details
    Efficient and Accurate Label Propagation on Dynamic Graphs and Label Sets
    Michele Covell
    International Journal on Advances in Networks and Services, vol. 6 (2013), pp. 246-259
    Preview abstract Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across nodes, starting from partial information and a weighted-connection graph. In their work on video recommendations, Baluja et al. showed high-quality results using Adsorption, a normalized propagation process. An important step in the original formulation of Adsorption was re-normalization of the label vectors associated with each node, between every propagation step. That interleaved normalization forced computation of all label distributions, in synchrony, in order to allow the normalization to be correctly determined. Interleaved normalization also prevented use of standard linear-algebra methods, like stabilized bi-conjugate gradient descent (BiCGStab) and Gaussian elimination. We show how to replace the interleaved normalization with a single pre-normalization, done once before the main propagation process starts, allowing use of selective label computation (label slicing) as well as large-matrix-solution methods. As a result, much larger graphs and label sets can be handled than in the original formulation and more accurate solutions can be found in fewer propagation steps. We further extend that work to handle graphs that change and expand over time. We report results from using pre-normalized Adsorption in topic labeling for web domains, using label slicing and BiCGStab. We also report results from using incremental updates on changing co-author network data. Finally, we discuss two options for handling mixed-sign (positive and negative) graphs and labels. View details
    Efficient and Accurate Label Propagation on Large Graphs and Label Sets
    Michele Covell
    Proceedings International Conference on Advances in Multimedia, IARIA (2013)
    Preview abstract Many web-based application areas must infer label distributions starting from a small set of sparse, noisy labels. Examples include searching for, recommending, and advertising against image, audio, and video content. These labeling problems must handle millions of interconnected entities (users, domains, content segments) and thousands of competing labels (interests, tags, recommendations, topics). Previous work has shown that graph-based propagation can be very effective at finding the best label distribution across nodes, starting from partial information and a weighted-connection graph. In their work on video recommendations, Baluja et al. [1] showed high-quality results using Adsorption, a normalized propagation process. An important step in the original formulation of Adsorption was re-normalization of the label vectors associated with each node, between every propagation step. That interleaved normalization forced computation of all label distributions, in synchrony, in order to allow the normalization to be correctly determined. Interleaved normalization also prevented use of standard linear-algebra methods, like stabilized bi-conjugate gradient descent (BiCGStab) and Gaussian elimination. This paper presents a method that replaces the interleaved normalization with a single pre-normalization, done once before the main propagation process starts, allowing use of selective label computation (label slicing) as well as large-matrix-solution methods. As a result, much larger graphs and label sets can be handled than in the original formulation and more accurate solutions can be found in fewer propagation steps. We also report results from using pre-normalized Adsorption in topic labeling for web domains, using label slicing and BiCGStab. View details
    Point Representation for Local Optimization: Towards Multi-Dimensional Gray Codes
    Michele Covell
    Proceedings IEEE Congress on Evolutionary Computation, IEEE (2013)
    Preview abstract In the context of stochastic search, once regions of high performance are found, having the property that small changes in the candidate solution correspond to searching nearby neighborhoods provides the ability to perform effective local optimization. To achieve this, Gray Codes are often employed for encoding ordinal points or discretized real numbers. In this paper, we present a method to label similar and/or close points within arbitrary graphs with small Hamming distances. The resultant point labels can be viewed as an approximate high-dimensional variant of Gray Codes. The labeling procedure is useful for any task in which the solution requires the search algorithm to select a small subset of items out of many. A large number of empirical results using these encodings with a combination of genetic algorithms and hill-climbing are presented. View details
    A Tale of Two (Similar) Cities: Inferring City Similarity Through Geo-Spatial Query Log Analysis
    Rohan Seth
    Michele Covell
    D. Sivakumar
    Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (2011)
    Preview abstract Understanding the backgrounds and interest of the people who are consuming a piece of content, such as a news story, video, or music, is vital for the content producer as well the advertisers who rely on the content to provide a channel on which to advertise. We extend traditional search-engine query log analysis, which has primarily concentrated on analyzing either single or small groups of queries or users, to examining the complete query stream of very large groups of users – the inhabitants of 13,377 cities across the United States. Query logs can be a good representation of the interests of the city’s inhabitants and a useful characterization of the city itself. Further, we demonstrate how query logs can be effectively used to gather city-level statistics sufficient for providing insights into the similarities and differences between cities. Cities that are found to be similar through the use of query analysis correspond well to the similar cities as determined through other large-scale and time-consuming direct measurement studies, such as those undertaken by the Census Bureau. View details
    Beyond “Near-Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval
    Michele Covell
    20th International Conference on Pattern Recognition 2010
    Preview abstract Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we present a two-tier similar-image retrieval system with the efficiency characteristics found in simpler systems designed to recognize near-duplicates. We compare the efficiency of lookups based on random projections and learned hashes to 100-times-more-frequent exemplar sampling. Both approaches significantly improve on the results from exemplar sampling, despite having significantly lower computational costs. Learned-hash keys provide the best result, in terms of both recall and efficiency. View details
    Preview abstract We present a new CAPTCHA which is based on identifying an image's upright orientation. This task requires analysis of the often complex contents of an image, a task which humans usually perform well and machines generally do not. Given a large repository of images, such as those from a web search result, we use a suite of automated orientation detectors to prune those images that can be automatically set upright easily. We then apply a social feedback mechanism to verify that the remaining images have a human-recognizable upright orientation. The main advantages of our CAPTCHA technique over the traditional text recognition techniques are that it is language-independent, does not require text-entry (e.g. for a mobile device), and employs another domain for CAPTCHA generation beyond character obfuscation. This CAPTCHA lends itself to rapid implementation and has an almost limitless supply of images. We conducted extensive experiments to measure the viability of this technique. View details
    Text Classification Through Time: Efficient Label Propagation in Time-Based Graphs
    D. Sivakumar
    International Conference on Knowledge Discovery and Information Retrieval (2009)
    Preview
    Finding Images and Line Drawings in Document-Scanning Systems
    Michele Covell
    Proc. International Conference on Document Analysis and Retrieval, IAPR (2009)
    Preview abstract This work addresses the problem of finding images and line-drawings in scanned pages. It is a crucial processing step in the creation of a large-scale system to detect and index images found in books and historic documents. Within the scanned pages that contain both text and images, the images are found through the use of local-feature extraction, applied across the full scanned page. This is followed by a novel learning system to categorize the local features into either text or image. The discrimination is based on using multiple classifiers trained via stochastic sampling of weak classifiers for each AdaBoost stage. The approach taken in sampling includes stochastic hill climbing across weak detectors, allowing us to reduce our classification error by as much as 25% relative to more naive stochastic sampling. Stochastic hill climbing in the weak classifier space is possible due to the manner in which we parameterize the weak classifier space. Through the use of this system, we improve image detection by finding more line-drawings, graphics, and photographs, as well as reducing the number of spurious detections due to misclassified text, discoloration, and scanning artifacts. View details
    LSH Banding for Large-Scale Retrieval with Memory and Recall Constraints
    Michele Covell
    International Conference on Acoustics, Speech, and Signal Processing, IEEE (2009)
    Preview abstract Locality Sensitive Hashing (LSH) is widely used for efficient retrieval of candidate matches in very large audio, video, and image systems. However, extremely large reference databases necessitate a guaranteed limit on the memory used by the table lookup itself, no matter how the entries crowd different parts of the signature space, a guarantee that LSH does not give. In this paper, we provide such guaranteed limits, primarily through the design of the LSH bands. When combined with data-adaptive bin splitting (needed on only 0.04% of the occupied bins), this approach provides the required guarantee on memory usage. At the same time, it avoids the reduced recall that more extensive use of bin splitting would give. View details
    Preview abstract The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets, the complexity in these domains is exacerbated by an imprecise definition of similarity. In this paper, we describe a method to learn a similarity function from only weakly labeled positive examples. Once learned, this similarity function is used as the basis of a hash function to severely constrain the number of points considered for each lookup. Tested on a large real-world audio dataset, only a tiny fraction of the points (~0.27%) are ever considered for each lookup. To increase efficiency, no comparisons in the original high-dimensional space of points are required. The performance far surpasses, in terms of both efficiency and accuracy, a state-of-the-art Locality-Sensitive-Hashing-based (LSH) technique for the same problem and data set. View details
    Permutation Grouping: Intelligent Hash Function Design for Audio & Image Retrieval
    Michele Covell
    Sergey Ioffe
    International Conference on Acoustics, Speech and Signal Processing (ICASSP-2008)
    Preview
    VisualRank: Applying PageRank to Large-Scale Image Search
    Yushi Jing
    IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30 (2008), pp. 1877-1890
    Preview abstract Because of the relative ease in understanding and processing text, commercial image-search systems often rely on techniques that are largely indistinguishable from text search. Recently, academic studies have demonstrated the effectiveness of employing image-based features to provide either alternative or additional signals to use in this process. However, it remains uncertain whether such techniques will generalize to a large number of popular Web queries and whether the potential improvement to search quality warrants the additional computational cost. In this work, we cast the image-ranking problem into the task of identifying “authority” nodes on an inferred visual similarity graph and propose VisualRank to analyze the visual link structures among images. The images found to be “authorities” are chosen as those that answer the image-queries well. To understand the performance of such an approach in a real system, we conducted a series of large-scale experiments based on the task of retrieving images for 2,000 of the most popular products queries. Our experimental results show significant improvement, in terms of user satisfaction and relevancy, in comparison to the most recent Google Image Search results. Maintaining modest computational cost is vital to ensuring that this procedure can be used in practice; we describe the techniques required to make this system practical for large-scale deployment in commercial search engines. View details
    Coordinated Multi-Device Presentations: Ambient-Audio Identification
    Michele Covell
    Encyclopedia of Wireless and Mobile Communications, Taylor & Francis (2008), pp. 274-285
    Preview
    Query Suggestions for Mobile Search: Understanding Usage Patterns
    Proceedings of the SIGCHI conference on Human Factors in computing systems (CHI) (2008)
    Preview
    Learning Forgiving Hash Functions: Algorithms and Large Scale Tests
    Michele Covell
    IJCAI-07: International Joint Conference on Artificial Intelligence (2007)
    Preview
    Boosting Sex Identification Performance
    International Journal of Computer Vision, vol. 71 (2007), pp. 111-119
    Preview
    Known-Audio Detection Using Waveprint: Spectrogram Fingerprinting By Wavelet Hashing
    Michele Covell
    Proceedings of the 2007 International Conference on Acoustics, Speech, and Signal Processing
    Preview
    Audio Fingerprinting: Combining Computer Vision & Data Stream Processing
    Michele Covell
    Proceedings of the 2007 International Conference on Acoustics, Speech, and Signal Processing
    Preview
    Canonical Image Selection from the Web
    Yushi Jing
    ACM International Conference on Image and Video Retrieval (2007)
    Preview
    Large Scale Image-Based Adult-Content Filtering
    Yushi Jing
    1st International Conference on Computer Vision Theory, Sebutal, Portugal (2006)
    Preview
    Advertisement Detection and Replacement using Acoustic and Visual Repetition
    Michele Covell
    Proceedings of the 2006 International Workshop on Multimedia Signal Processing, IEEE
    Preview
    Content Fingerprinting Using Wavelets
    Michele Covell
    Proceedings of the Conference of Visual Media Production, IET (2006)
    Preview
    A Large Scale Study of Wireless Search Behavior: Google Mobile Search
    Proceedings of the SIGCHI conference on Human Factors in computing systems (CHI) (2006)
    Preview
    Browsing on Small Screens: Recasting Web-Page Segmentation into an Efficient Machine Learning Framework
    Proceedings of the Fifteenth International World Wide Web Conference, Edinburgh, Scotland (2006)
    Preview
    Boosting Sex Identification Performance
    Proceedings of the Seventeenth Innovative Applications of Artificial Intelligence Conference, AAAI (2005), pp. 1508-1513
    Preview
    Efficient Face Orientation Discrimination
    Mehran Sahami
    International Conference on Image Processing (ICIP-2004)
    Preview
    The Happy Searcher: Challenges in Web Information Retrieval
    Mehran Sahami
    Vibhu Mittal
    The Eighth Pacific Rim International Conference on Artificial Intelligence (PRICAI-2004)
    Preview
    Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework
    WWW (2006), pp. 33-42
    Using a priori knowledge to create probabilistic models for optimization
    Int. J. Approx. Reasoning, vol. 31 (2002), pp. 193-220
    Applying Machine Learning for High-Performance Named-Entity Extraction
    Vibhu O. Mittal
    Computational Intelligence, vol. 16 (2000), pp. 586-596
    Memory-Based Face Recognition for Visitor Identification
    Terence Sim
    Matthew Mullin
    FG (2000), pp. 214-220
    Using Labeled and Unlabeled Data for Probabilistic Modeling of Face Orientation
    IJPRAI, vol. 14 (2000), pp. 1097-1107
    Neural Network-Based Face Detection
    Takeo Kanade
    IEEE Trans. Pattern Anal. Mach. Intell., vol. 20 (1998), pp. 23-38
    Rotation Invariant Neural Network-Based Face Detection
    Takeo Kanade
    CVPR (1998), pp. 38-44
    Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data
    NIPS (1998), pp. 854-860
    Evolution-Based Methods for Selecting Point Data for Object Localization: Applications to Computer-Assisted Surgery
    David Simon
    Appl. Intell., vol. 8 (1998), pp. 7-19
    Making Templates Rotationally Invariant. An Application to Rotated Digit Recognition
    NIPS (1998), pp. 847-853
    Fast Probabilistic Modeling for Combinatorial Optimization
    Scott Davies
    AAAI/IAAI (1998), pp. 469-476
    Using Optimal Dependency-Trees for Combinational Optimization
    Scott Davies
    ICML (1997), pp. 30-38
    Using Expectation to Guide Processing: A Study of Three Real-World Applications
    NIPS (1997)
    Dynamic Relevance: Vision-Based Focus of Attention Using Artificial Neural Networks. (Technical Note)
    Dean Pomerleau
    Artif. Intell., vol. 97 (1997), pp. 381-395
    Integrating Text and Face Detection for Finding Informative Poster Frames
    Michael Smith
    AAAI Spring Symposium (1997), pp. 95-101
    Expectation-based selective attention for visual monitoring and control of a robot vehicle
    Dean Pomerleau
    Robotics and Autonomous Systems, vol. 22 (1997), pp. 329-344
    Genetic Algorithms and Explicit Search Statistics
    NIPS (1996), pp. 319-325
    Neural Network-Based Face Detection
    Takeo Kanade
    CVPR (1996), pp. 203-208
    Human Face Detection in Visual Scenes
    Takeo Kanade
    NIPS (1995), pp. 875-881
    Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation
    Rich Caruana
    Tom M. Mitchell
    NIPS (1995), pp. 959-965
    Using the Representation in a Neural Network's Hidden Layer for Task-Specific Focus of Attention
    Dean Pomerleau
    IJCAI (1995), pp. 133-141
    Removing the Genetics from the Standard Genetic Algorithm
    Rich Caruana
    ICML (1995), pp. 38-46
    Non-Intrusive Gaze Tracking Using Artificial Neural Networks
    Dean Pomerleau
    NIPS (1993), pp. 753-760