# Dror Aiger

Dror Aiger is a research scientist at Google from July 2011. Prior to joining Google, he held the position of associate professor of computer science in the laboratory of computer science Gaspard-Monge, the university of Paris-EST and ESIEE Paris. Before that he held a research position in Orbotech LTD. His main areas of interest are computer vision, computational geometry, image processing and computer graphics.

Authored Publications

Sort By

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization

International Conference on Computer Vision (ICCV'23), IEEE / CVF (2023) (to appear)

Preview abstract
Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code will be released.
View details

SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow

Itai Lang

Shai Avidan

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023

Preview abstract
Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the scene's 3D motion from its consecutive observations. Recently, there is a research effort to compute scene flow using 3D point clouds. A main approach is to train a regression model that consumes a source and target point clouds and outputs the per-point translation vector. An alternative approach is to learn point correspondence between the point clouds, concurrently with a refinement regression of the initial flow. In both approaches the task is very challenging, since the flow is regressed in the free 3D space, and a typical solution is to resort to a large annotated synthetic dataset.
We introduce CorrFlow, a new method for scene flow estimation that can be learned on a small amount of data without using ground-truth flow supervision. In contrast to previous works, we train a pure correspondence model that is focused on learning point feature representation, and initialize the flow as the difference between a source point and its softly corresponding target point. Then, at test time, we directly optimize a flow refinement component with a self-supervised objective, which leads to a coherent flow field between the point clouds. Experiments on widely used datasets demonstrate the performance gains achieved by our method compared to existing leading techniques.
View details

General techniques for approximate incidences and their application to the camera posing problem

Micha Sharir

Bernhard Zeisl

The 35th International Symposium on Computational Geometry (2019)

Preview abstract
We consider the classical camera pose estimation problem that arises in many computer vision
applications, in which we are given n 2D-3D correspondences between points in the scene and
points in the camera image (some of which are incorrect associations), and where we aim to
determine the camera pose (the position and orientation of the camera in the scene) from this
data. We demonstrate that this posing problem can be reduced to the problem of computing
ε-approximate incidences between two-dimensional surfaces (derived from the input correspondences) and points (on a grid) in a four-dimensional pose space. Similar reductions can be applied
to other camera pose problems, as well as to similar problems in related application areas.
We describe and analyze three techniques for solving the resulting ε-approximate incidences
problem in the context of our camera posing application. The first is a straightforward assignment
of surfaces to the cells of a grid (of side-length ε) that they intersect. The second is a variant
of a primal-dual technique, recently introduced by a subset of the authors [2] for different (and
simpler) applications. The third is a non-trivial generalization of a data structure Fonseca and
Mount [3], originally designed for the case of hyperplanes. We present and analyze this technique
in full generality, and then apply it to the camera posing problem at hand.
We compare our methods experimentally on real and synthetic data. Our experiments show
that for the typical values of n and ε, the primal-dual method is the fastest, also in practice.
View details

Large-scale, real-time visual-inertial localization revisited

Bernhard Zeisl

Michael Bosse

Joel Hesch

Marc Pollefeys

Roland Siegwart

Torsten Sattler

International Journal of Robotics Research, 39(9) (2019)

Preview abstract
The overreaching goals in image-based localization are larger, better and faster. For the recent years approaches based on local features and sparse 3d point-cloud models have both dominated the benchmarks and seen successful realworld deployment. Recently end-to-end learned localization approaches have been proposed which show promising results on small and medium scale datasets. However the positioning accuracy, latency and compute requirements of these approaches remain an area of work. End-to-end learned approaches also typically require encoding the geometry of the environment in the model, which causes performance problems in large scale scenes and results in a hard to accomodate memory footprint. To deploy localization at world-scale we thus continue to rely on local features and sparse 3d models. We don’t only look at localization though. The goal is to build a scalable and robust end-to-end system including model building, compression, localization and client-side pose fusion for deployment at scale. Our method compresses appearance and geometry of the scene, allows for low-latency localization queries and efficient fusion, leading to scalability beyond what what has been previously demonstrated. In order to further improve efficiency we leverage a combination of priors, nearest neighbor search, geometric match culling and a cascaded pose candidate refinement step. This combination outperforms other approaches when working with large scale models. We demonstrate the effectiveness of our approach on a proof-of-concept system localizing 2.5 million images against models from four cities in different regions on the world achieving query latencies in the 200ms range.
View details

Output sensitive algorithms for approximate incidences and their applications

Micha Sharir

Computational Geometry Theory and Applications, 91 (2019), pp. 101666

Preview abstract
An ε-approximate incidence between a point and some geometric object (line, circle,
plane, sphere) occurs when the point and the object lie at distance at most ε from
each other. Given a set of points and a set of objects, computing the approximate
incidences between them is a major step in many database and web-based applications
in computer vision and graphics, including robust model fitting, approximate point
pattern matching, and estimating the fundamental matrix in epipolar (stereo) geometry.
In a typical approximate incidence problem of this sort, we are given a set P of m
points in two or three dimensions, a set S of n objects (lines, circles, planes, spheres),
and an error parameter ε > 0, and our goal is to report all pairs (p, s) ∈ P × S
that lie at distance at most ε from one another. We present efficient output-sensitive
approximation algorithms for quite a few cases, including points and lines or circles in
the plane, and points and planes, spheres, lines, or circles in three dimensions. Several
of these cases arise in the applications mentioned above. Our algorithms report all pairs
at distance ≤ ε, but may also report additional pairs, all of which are guaranteed to be
at distance at most αε, for some problem-dependent constant α > 1. Our algorithms
are based on simple primal and dual grid decompositions and are easy to implement.
We note that (a) the use of duality, which leads to significant improvements in the
overhead cost of the algorithms, appears to be novel for this kind of problems; (b) the
correct choice of duality in some of these problems is fairly intricate and requires some
care; and (c) the correctness and performance analysis of the algorithms (especially in
the more advanced versions) is fairly non-trivial. We analyze our algorithms and prove
guaranteed upper bounds on their running time and on the “distortion” parameter α.
View details