Jump to Content
Amir Globerson

Amir Globerson

Amir Globerson received his BSc in computer science and physics in 1997 from the Hebrew University, and his PhD in computational neuroscience from the Hebrew University in 2006. After his PhD, he was a postdoctoral fellow at the University of Toronto and a Rothschild postdoctoral fellow at MIT. He joined the Hebrew University school of computer science in 2008, and moved to the Tel Aviv University School of Computer Science in 2015. Prof. Globerson’s research interests include machine learning, probabilistic inference, convex optimization, neural computation and natural language processing. He is an associate editor for the Journal of Machine Learning Research, and the Associate Editor in Chief for the IEEE Transactions on Pattern Analysis and Machine Intelligence. His work has received several prizes including five paper awards (two at NIPS, two at UAI, and one at ICML), as well as one runner up for best paper at ICML. His research has been supported by several grants and awards from ISF, BSF, GIF, Intel, HP, and Google. In 2015 he was a visiting scientist at Google Mountain View and since 2017 he is a Research Scientist at Google in Tel Aviv.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract In educational dialogue settings students often provide answers that are incomplete. In other words, there is a gap between the answer the student provides and the perfect answer expected by the teacher. Successful dialogue hinges on the teacher asking about this gap in an effective manner, thus creating a rich and interactive educational experience. Here we focus on the problem of generating such gap-focused questions (GFQ) automatically. We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these. Finally, we provide an evaluation of our generated questions and compare them to manually generated ones, demonstrating competitive performance. View details
    Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Network
    Edo Cohen-Karlik
    Itamar Menuhin-Gruman
    Nadav Cohen
    Raja Giryes
    ICLR (2023)
    Preview abstract Overparameterization in deep learning typically refers to settings where a trained Neural Network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of Recurrent Neural Networks (RNNs), there exists an additional layer of overparameterization, in the sense that a model may exhibit many solutions that generalize well for sequence lengths seen in training, some of which extrapolate to longer sequences, while others do not. Numerous works studied the tendency of Gradient Descent (GD) to fit overparameterized NNs with solutions that generalize well. On the other hand, its tendency to fit overparameterized RNNs with solutions that extrapolate has been discovered only lately, and is far less understood. In this paper, we analyze the extrapolation properties of GD when applied to overparameterized linear RNNs. In contrast to recent arguments suggesting an implicit bias towards short-term memory, we provide theoretical evidence for learning low dimensional state spaces, which can also model long-term memory. Our result relies on a dynamical characterization which shows that GD (with small step size and near-zero initialization) strives to maintain a certain form of balancedness, as well as on tools developed in the context of the moment problem from statistics (recovery of a probability distribution from its moments). Experiments corroborate our theory, demonstrating extrapolation via learning low dimensional state spaces with both linear and non-linear RNNs View details
    Visual Prompting via Image Inpainting
    Amir Bar
    Yossi Gandelsman
    Trevor Darrell
    Alexei Efros
    Advanced in Neural Information Processing Systems (2022)
    Preview abstract How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc. View details
    Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
    Elad Ben Avraham
    Roei Herzig
    Karttikeya Mangalam
    Amir Bar
    Anna Rohrbach
    Leonid Karlinksy
    Trevor Darrell
    Advanced in Neural Information Processing Systems (2022)
    Preview abstract Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. At the same time, if a small set of annotated images is available, either within or outside the domain of interest, how could we leverage these for a video downstream task? We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structured information, we enrich a transformer model with a set of object tokens that can be used across images and videos. Second, the scene representations of individual frames in video should ``align'' with those of still images. This is achieved via a Frame-Clip Consistency loss, which ensures the flow of structured information between images and videos. We explore a particular instantiation of scene structure, namely a Hand-Object Graph, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges. SViT shows strong performance improvements on multiple video understanding tasks and datasets, including the first place in the Ego4D CVPR'22 Point of No Return Temporal Localization Challenge. For code and pretrained models, visit the project page at https://eladb3.github.io/SViT/. View details
    Active Learning with Label Comparisons
    Shay Moran
    Uncertainty in Artificial Intelligence (submitted) (2022)
    Preview abstract Supervised learning typically relies on manual annotation of the true labels. However, when there are many potential labels, it will be time consuming for a human annotator to search these for the best one. On the other hand, comparing two candidate labels is often much easier. In this paper, we focus on this type of pairwise supervision, and ask how it can be used effectively in learning, and in particular active learning. We obtain several surprising results in this context. In principle, finding the best label out of $k$ can be done with $k-1$ active queries. However, we show that there is a natural class where this approach is in fact sub-optimal, and that there is a more comparison-efficient active learning scheme. A key element in our analysis is the ``label neighborhood graph'' of the true distribution, which has an edge between two classes if they share a decision boundary. We also show that in the PAC setting, pairwise comparisons cannot provide improved sample complexity in the worst case. We complement our theoretical results with experiments, clearly demonstrating the effect of the neighborhood graph on sample complexity. View details
    A Theoretical Analysis of Fine-tuning with Linear Teachers
    Alon Brutzkus
    Gal Shachaf
    Advanced in Neural Information Processing Systems (2021)
    Preview abstract Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. Here we analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the similarity between the source tasks and the target task, however measuring this similarity is non trivial. We show that generalization is related to a measure that considers the relation between the source task, target task and covariance structure of the target data. In the setting of linear regression, we show that under realistic settings a substantial sample complexity reduction is plausible when the above measure is low. For deep linear regression, we present a novel result regarding the inductive bias of gradient-based training when the network is initialized with pretrained weights. Using this result we show that the similarity measure for this setting is also affected by the depth of the network. We further present results on shallow ReLU models, and analyze the dependence of sample complexity on source and target tasks in this setting. View details
    Preview abstract Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is the S-space of StyleGAN, which is known to generate semantically meaningful dimensions in the image. However, these will typically not correspond to classifier-specific attributes since standard GAN training is not dependent on the classifier. To overcome this, we propose training procedure for a StyleGAN, which incorporates the classifier model. This results in an S-space that captures distinct attributes underlying classifier outputs. After training, the model can be used to visualize the effect of changing multiple attributes per image, thus providing an image-specific explanation. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be changed in different ways to change its classifier prediction. Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are interpretable as measured in user-studies. View details
    Preview abstract Complex classifiers may exhibit ``embarassing'' failures in cases that would be easily classified and justified by a human. Avoiding such failures is obviously paramount, particularly in domains where we cannot accept such unexplained behavior. In this work we focus on one such setting, where a label is perfectly predictable if the input contains certain features, and otherwise, it is predictable by a linear classifier. We define a related hypothesis class and determine its sample complexity. We also give evidence that efficient algorithms cannot, unfortunately, enjoy this sample complexity. We then derive a simple and efficient algorithm, and also give evidence that its sample complexity is optimal, among efficient algorithms. Experiments on sentiment analysis demonstrate the efficacy of the method, both in terms of accuracy and interpretability. View details
    Preview abstract Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind. We consider a neural net with one hidden layer and a convolutional structure with no overlap, and a ReLU activation function. For this architecture we show that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations View details
    Improper Deep Kernels
    Uri Heinemann
    Roi Livni
    Proceedings of The 19th International Conference on Artificial Intelligence and Statististics. (2016)
    Preview abstract Neural networks have recently re-emerged as a powerful hypothesis class, yielding impressive classification accuracy in multiple domains. However, their training is a non-convex optimization problem which poses theoretical and practical challenges. Here we address this difficulty by turning to ``improper'' learning of neural nets. In other words, we learn a classifier that is not a neural net but is competitive with the best neural net model given a sufficient number of training examples. Our approach relies on a novel kernel construction scheme in which the kernel is a result of integration over the set of all possible instantiation of neural models. It turns out that the corresponding integral can be evaluated in closed-form via a simple recursion. Thus we translate the non-convex, hard learning problem of a neural net to a SVM with an appropriate kernel. We also provide sample complexity results which depend on the stability of the optimal neural net. View details
    Preview abstract Entity resolution is the task of linking each mention of an entity in text to the corresponding record in a knowledge base (KB). Coherence models for entity resolution encourage all referring expressions in a document to resolve to entities that are related in the KB. We explore attention-like mechanisms for coherence, where the evidence for each candidate is based on a small set of strong relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our proposed system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011 and 2012 tasks. View details
    Euclidean Embedding of Co-occurrence Data
    Gal Chechik
    Naftali Tishby
    Journal of Machine Learning Research, vol. 8 (2007), pp. 2265-2295
    Rich cell-type-specific network topology in neocortical microcircuitry
    Eyal Gal
    Michael London
    Srikanth Ramaswamy
    Michael W Reimann
    Eilif Muller
    Henry Markram
    Idan Segev
    Nature Neuroscience, vol. 20 (2017), pp. 1004-1013
    Discrete Chebyshev Classifiers
    Elad Mezuman
    Proceedings of the 31th International Conference on Machine Learning, ICML 2014, JMLR.org, pp. 1233-1241
    Learning Max-Margin Tree Predictors
    Ofer Meshi
    Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI2013) (2013), pp. 411
    Distributed Latent Variable Models of Lexical Co-occurrences
    Tenth International Workshop on Artificial Intelligence and Statistics (2005)