Amir Globerson
Amir Globerson received his BSc in computer science and physics in 1997 from the Hebrew University, and his PhD in computational neuroscience from the Hebrew University in 2006. After his PhD, he was a postdoctoral fellow at the University of Toronto and a Rothschild postdoctoral fellow at MIT. He joined the Hebrew University school of computer science in 2008, and moved to the Tel Aviv University School of Computer Science in 2015. Prof. Globerson’s research interests include machine learning, probabilistic inference, convex optimization, neural computation and natural language processing. He is an associate editor for the Journal of Machine Learning Research, and the Associate Editor in Chief for the IEEE Transactions on Pattern Analysis and Machine Intelligence. His work has received several prizes including five paper awards (two at NIPS, two at UAI, and one at ICML), as well as one runner up for best paper at ICML. His research has been supported by several grants and awards from ISF, BSF, GIF, Intel, HP, and Google. In 2015 he was a visiting scientist at Google Mountain View and since 2017 he is a Research Scientist at Google in Tel Aviv.
Research Areas
Authored Publications
Sort By
Covering Uncommon Ground: Followup Question Generation for Answer Assessment
Alexandre Djerbetian
Reut Tsarfaty
ACL (2023) (to appear)
Preview abstract
In educational dialogue settings students often provide answers that are incomplete. In other words, there is a gap between the answer the student provides and the perfect answer expected by the teacher. Successful dialogue hinges on the teacher asking about this gap in an effective manner, thus creating a rich and interactive educational experience. Here we focus on the problem of generating such gap-focused questions (GFQ) automatically. We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these. Finally, we provide an evaluation of our generated questions and compare them to manually generated ones, demonstrating competitive performance.
View details
Preview abstract
Overparameterization in deep learning typically refers to settings where a trained Neural Network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of Recurrent Neural Networks (RNNs), there exists an additional layer of overparameterization, in the sense that a model may exhibit many solutions that generalize well for sequence lengths seen in training, some of which extrapolate to longer sequences, while others do not. Numerous works studied the tendency of Gradient Descent (GD) to fit overparameterized NNs with solutions that generalize well. On the other hand, its tendency to fit overparameterized RNNs with solutions that extrapolate has been discovered only lately, and is far less understood. In this paper, we analyze the extrapolation properties of GD when applied to overparameterized linear RNNs. In contrast to recent arguments suggesting an implicit bias towards short-term memory, we provide theoretical evidence for learning low dimensional state spaces, which can also model long-term memory. Our result relies on a dynamical characterization which shows that GD (with small step size and near-zero initialization) strives to maintain a certain form of balancedness, as well as on tools developed in the context of the moment problem from statistics (recovery of a probability distribution from its moments). Experiments corroborate our theory, demonstrating extrapolation via learning low dimensional state spaces with both linear and non-linear RNNs
View details
Visual Prompting via Image Inpainting
Amir Bar
Yossi Gandelsman
Trevor Darrell
Alexei Efros
Advanced in Neural Information Processing Systems (2022)
Preview abstract
How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc.
View details
Active Learning with Label Comparisons
Shay Moran
Uncertainty in Artificial Intelligence (submitted) (2022)
Preview abstract
Supervised learning typically relies on manual annotation of the true labels. However, when there are many potential labels, it will be time consuming for a human annotator to search these for the best one. On the other hand, comparing two candidate labels is often much easier. In this paper, we focus on this type of pairwise supervision, and ask how it can be used effectively in learning, and in particular active learning. We obtain several surprising results in this context. In principle, finding the best label out of $k$ can be done with $k-1$ active queries. However, we show that there is a natural class where this approach is in fact sub-optimal, and that there is a more comparison-efficient active learning scheme. A key element in our analysis is the ``label neighborhood graph'' of the true distribution, which has an edge between two classes if they share a decision boundary. We also show that in the PAC setting, pairwise comparisons cannot provide improved sample complexity in the worst case. We complement our theoretical results with experiments, clearly demonstrating the effect of the neighborhood graph on sample complexity.
View details
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben Avraham
Roei Herzig
Karttikeya Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinksy
Trevor Darrell
Advanced in Neural Information Processing Systems (2022)
Preview abstract
Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. At the same time, if a small set of annotated images is available, either within or outside the domain of interest, how could we leverage these for a video downstream task? We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structured information, we enrich a transformer model with a set of object tokens that can be used across images and videos. Second, the scene representations of individual frames in video should ``align'' with those of still images. This is achieved via a Frame-Clip Consistency loss, which ensures the flow of structured information between images and videos. We explore a particular instantiation of scene structure, namely a Hand-Object Graph, consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges. SViT shows strong performance improvements on multiple video understanding tasks and datasets, including the first place in the Ego4D CVPR'22 Point of No Return Temporal Localization Challenge. For code and pretrained models, visit the project page at https://eladb3.github.io/SViT/.
View details
Preview abstract
Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. Here we analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the similarity between the source tasks and the target task, however measuring this similarity is non trivial. We show that generalization is related to a measure that considers the relation between the source task, target task and covariance structure of the target data. In the setting of linear regression, we show that under realistic settings a substantial sample complexity reduction is plausible when the above measure is low. For deep linear regression, we present a novel result regarding the inductive bias of gradient-based training when the network is initialized with pretrained weights. Using this result we show that the similarity measure for this setting is also affected by the depth of the network. We further present results on shallow ReLU models, and analyze the dependence of sample complexity on source and target tasks in this setting.
View details
Explaining in Style: Training a GAN to explain a classifier in StyleSpace
Yossi Gandelsman
Michal Yarom
Yoav Itzhak Wald
Phillip Isola
Michal Irani
Proc. ICCV 2021
Preview abstract
Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is the S-space of StyleGAN, which is known to generate semantically meaningful dimensions in the image. However, these will typically not correspond to classifier-specific attributes since standard GAN training is not dependent on the classifier. To overcome this, we propose training procedure for
a StyleGAN, which incorporates the classifier model. This results in an S-space that captures distinct attributes underlying classifier outputs. After training, the model can be used to visualize the effect of changing multiple attributes per image, thus providing an image-specific explanation. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be changed in different ways to change its classifier prediction.
Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are interpretable as measured in user-studies.
View details
Preview abstract
Complex classifiers may exhibit ``embarassing'' failures in cases that would be easily classified and justified by a human. Avoiding such failures is obviously paramount, particularly in domains where we cannot accept such unexplained behavior. In this work we focus on one such setting, where a label is perfectly predictable if the input contains certain features, and otherwise, it is predictable by a linear classifier. We define a related hypothesis class and determine its sample complexity.
We also give evidence that efficient algorithms cannot, unfortunately, enjoy this sample complexity. We then derive a simple and efficient algorithm, and also give evidence that its sample complexity is optimal, among efficient algorithms. Experiments on sentiment analysis demonstrate the efficacy of the method, both in terms of accuracy and interpretability.
View details
Preview abstract
Deep learning models are often successfully
trained using gradient descent, despite the worst
case hardness of the underlying non-convex optimization
problem. The key question is then under
what conditions can one prove that optimization
will succeed. Here we provide a strong result
of this kind. We consider a neural net with
one hidden layer and a convolutional structure
with no overlap, and a ReLU activation function.
For this architecture we show that learning
is NP-complete in the general case, but that
when the input distribution is Gaussian, gradient
descent converges to the global optimum in polynomial
time. To the best of our knowledge, this
is the first global optimality guarantee of gradient
descent on a convolutional neural network with
ReLU activations
View details
Collective Entity Resolution with Multi-Focal Attention
Soumen Chakrabarti
Michael Ringaard
ACL (2016)
Preview abstract
Entity resolution is the task of linking each mention of an entity in text to the corresponding record in a knowledge base (KB). Coherence models for entity resolution encourage all referring expressions in a document to resolve to entities that are related in the KB. We explore attention-like mechanisms for coherence, where the evidence for each candidate is based on a small set of strong relations, rather than relations to all other entities in the document. The rationale is that document-wide support may simply not exist for non-salient entities, or entities not densely connected in the KB. Our proposed system outperforms state-of-the-art systems on the CoNLL 2003, TAC KBP 2010, 2011
and 2012 tasks.
View details