Jump to Content
Dror Aiger

Dror Aiger

Dror Aiger is a research scientist at Google from July 2011. Prior to joining Google, he held the position of associate professor of computer science in the laboratory of computer science Gaspard-Monge, the university of Paris-EST and ESIEE Paris. Before that he held a research position in Orbotech LTD. His main areas of interest are computer vision, computational geometry, image processing and computer graphics.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow
    Itai Lang
    Shai Avidan
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023
    Preview abstract Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the scene's 3D motion from its consecutive observations. Recently, there is a research effort to compute scene flow using 3D point clouds. A main approach is to train a regression model that consumes a source and target point clouds and outputs the per-point translation vector. An alternative approach is to learn point correspondence between the point clouds, concurrently with a refinement regression of the initial flow. In both approaches the task is very challenging, since the flow is regressed in the free 3D space, and a typical solution is to resort to a large annotated synthetic dataset. We introduce CorrFlow, a new method for scene flow estimation that can be learned on a small amount of data without using ground-truth flow supervision. In contrast to previous works, we train a pure correspondence model that is focused on learning point feature representation, and initialize the flow as the difference between a source point and its softly corresponding target point. Then, at test time, we directly optimize a flow refinement component with a self-supervised objective, which leads to a coherent flow field between the point clouds. Experiments on widely used datasets demonstrate the performance gains achieved by our method compared to existing leading techniques. View details
    Preview abstract Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code will be released. View details
    Preview abstract Outlier rejection and equivalently inlier set optimization is a key ingredient in numerous applications in computer vision such as filtering point-matches in camera pose estimation or plane and normal estimation in point clouds. Several approaches exist, yet at large scale we face a combinatorial explosion of possible solutions and state-of-the-art methods like RANSAC, Hough transform or Branch&Bound require a minimum inlier ratio or prior knowledge to remain practical. In fact, for problems such as camera posing in very large scenes these approaches become useless as they have exponential runtime growth if these conditions aren’t met. To approach the problem we present a efficient and general algorithm for outlier rejection based on “intersecting” k-dimensional surfaces in Rd . We provide a recipe for casting a variety of geometric problems as finding a point in Rd which maximizes the number of nearby surfaces (and thus inliers). The resulting algorithm has linear worst-case complexity with a better runtime dependency in the approximation factor than competing algorithms while not requiring domain specific bounds. This is achieved by introducing a space decomposition scheme that bounds the number of computations by successively rounding and grouping samples. Our recipe (and open-source code) enables anybody to derive such fast approaches to new problems across a wide range of domains. We demonstrate the versatility of the approach on several camera posing problems with a high number of matches at low inlier ratio achieving stateof-the-art results at significantly lower processing times. View details
    Duality-based approximation algorithms for maximum depth
    Micha Sharir
    arxiv, https://arxiv.org/abs/2006.12318 (2020)
    Preview abstract An ε-incidence between a point q and some curve c (e.g., line or circle) in the plane occurs when q is at (say, Euclidean) distance at most ε from c. We use variants of the recent grid-based primal-dual technique developed by the authors [ESA 2017, SoCG 2019] to design an efficient data structure for computing an approximate depth of any query point in the arrangement A(S) of a collection S of n halfplanes or triangles in the plane. We then use this structure to find a point of a suitably defined approximate maximum depth in A(S). Specifically, given an error parameter ε > 0, we compute (i) a point of approximate maximum depth, when we are allowed to exclude containments of points q in objects s when q is ε-incident to ∂s, and (ii) a point of approximate maximum depth, when we are allowed to include (false) containments of q in objects s when q is (outside s but) ε-incident to ∂s. For the case of triangles, the technique involves, on top of duality, a careful efficient implementation of a multi-level structure over the input triangles within a primal grid. View details
    Output sensitive algorithms for approximate incidences and their applications
    Micha Sharir
    Computational Geometry Theory and Applications, vol. 91 (2019), pp. 101666
    Preview abstract An ε-approximate incidence between a point and some geometric object (line, circle, plane, sphere) occurs when the point and the object lie at distance at most ε from each other. Given a set of points and a set of objects, computing the approximate incidences between them is a major step in many database and web-based applications in computer vision and graphics, including robust model fitting, approximate point pattern matching, and estimating the fundamental matrix in epipolar (stereo) geometry. In a typical approximate incidence problem of this sort, we are given a set P of m points in two or three dimensions, a set S of n objects (lines, circles, planes, spheres), and an error parameter ε > 0, and our goal is to report all pairs (p, s) ∈ P × S that lie at distance at most ε from one another. We present efficient output-sensitive approximation algorithms for quite a few cases, including points and lines or circles in the plane, and points and planes, spheres, lines, or circles in three dimensions. Several of these cases arise in the applications mentioned above. Our algorithms report all pairs at distance ≤ ε, but may also report additional pairs, all of which are guaranteed to be at distance at most αε, for some problem-dependent constant α > 1. Our algorithms are based on simple primal and dual grid decompositions and are easy to implement. We note that (a) the use of duality, which leads to significant improvements in the overhead cost of the algorithms, appears to be novel for this kind of problems; (b) the correct choice of duality in some of these problems is fairly intricate and requires some care; and (c) the correctness and performance analysis of the algorithms (especially in the more advanced versions) is fairly non-trivial. We analyze our algorithms and prove guaranteed upper bounds on their running time and on the “distortion” parameter α. View details
    Preview abstract We consider the classical camera pose estimation problem that arises in many computer vision applications, in which we are given n 2D-3D correspondences between points in the scene and points in the camera image (some of which are incorrect associations), and where we aim to determine the camera pose (the position and orientation of the camera in the scene) from this data. We demonstrate that this posing problem can be reduced to the problem of computing ε-approximate incidences between two-dimensional surfaces (derived from the input correspondences) and points (on a grid) in a four-dimensional pose space. Similar reductions can be applied to other camera pose problems, as well as to similar problems in related application areas. We describe and analyze three techniques for solving the resulting ε-approximate incidences problem in the context of our camera posing application. The first is a straightforward assignment of surfaces to the cells of a grid (of side-length ε) that they intersect. The second is a variant of a primal-dual technique, recently introduced by a subset of the authors [2] for different (and simpler) applications. The third is a non-trivial generalization of a data structure Fonseca and Mount [3], originally designed for the case of hyperplanes. We present and analyze this technique in full generality, and then apply it to the camera posing problem at hand. We compare our methods experimentally on real and synthetic data. Our experiments show that for the typical values of n and ε, the primal-dual method is the fastest, also in practice. View details
    Large-scale, real-time visual-inertial localization revisited
    Bernhard Zeisl
    Michael Bosse
    Joel Hesch
    Marc Pollefeys
    Roland Siegwart
    Torsten Sattler
    International Journal of Robotics Research, vol. 39(9) (2019)
    Preview abstract The overreaching goals in image-based localization are larger, better and faster. For the recent years approaches based on local features and sparse 3d point-cloud models have both dominated the benchmarks and seen successful realworld deployment. Recently end-to-end learned localization approaches have been proposed which show promising results on small and medium scale datasets. However the positioning accuracy, latency and compute requirements of these approaches remain an area of work. End-to-end learned approaches also typically require encoding the geometry of the environment in the model, which causes performance problems in large scale scenes and results in a hard to accomodate memory footprint. To deploy localization at world-scale we thus continue to rely on local features and sparse 3d models. We don’t only look at localization though. The goal is to build a scalable and robust end-to-end system including model building, compression, localization and client-side pose fusion for deployment at scale. Our method compresses appearance and geometry of the scene, allows for low-latency localization queries and efficient fusion, leading to scalability beyond what what has been previously demonstrated. In order to further improve efficiency we leverage a combination of priors, nearest neighbor search, geometric match culling and a cascaded pose candidate refinement step. This combination outperforms other approaches when working with large scale models. We demonstrate the effectiveness of our approach on a proof-of-concept system localizing 2.5 million images against models from four cities in different regions on the world achieving query latencies in the 200ms range. View details
    Preview abstract We introduce a method to classify imagery using a convo- lutional neural network (CNN) on multi-view image pro- jections. The power of our method comes from using pro- jections of multiple images at multiple depth planes near the reconstructed surface. This enables classification of categories whose salient aspect is appearance change un- der different viewpoints, such as water, trees, and other materials with complex reflection/light response proper- ties. Our method does not require boundary labelling in images and works on pixel-level classification with a small (few pixels) context, which simplifies the cre- ation of a training set. We demonstrate this application on large-scale aerial imagery collections, and extend the per-pixel classification to robustly create a consistent 2D classification which can be used to fill the gaps in non- reconstructible water regions. We also apply our method to classify tree regions. In both cases, the training data can quickly be generated using a small number of manually- created polygons on a map. We show that even with a very simple and standard network our CNN outperforms the state-of-the-art image classification, the Inception-V3 model retrained from a large collection of aerial images. View details
    Homotheties and incidences
    Micha Sharir
    Discrete Mathematics, vol. Volume 341, Issue 7 (2018), 2011–2017
    Preview abstract We consider problems involving rich homotheties in a set S of n points in the plane (that is, homotheties that map many points of S to other points of S). By reducing these problems to incidence problems involving points and lines in R^3, we are able to obtain refined and new bounds for the number of rich homotheties, and for the number of distinct equivalence classes, under homotheties, of k-element subsets of S, for any k ≥ 3. We also discuss the extensions of these problems to three and higher dimensions. View details
    Preview abstract Methods and systems for texturing of three-dimensional (3D) object data models are provided. An example method may include receiving information indicating a geometry of an object receiving a plurality of images of the object. The method may also include assigning images of the plurality of images that have a resolution above a threshold to a plurality of polygons that approximate the geometric surface of the object. The method may also include determining adjacent polygons that are assigned to different images of the plurality of images so as to identify boundaries of images and minimizing such boundaries. The method may also include determining a mismatch factor for boundaries of the modified boundaries of images and reassigning images in boundaries having a mismatch factor above a threshold so as to reduce a gradient variation between the images in the modified boundaries. View details
    Preview abstract Methods and apparatus for aligning objects are provided. A computing device can receive first and second object representations that are respectively associated with first and second surface representations. The computing device can apply an object transformation to the respective first and second object representations to modify geometric features of the respective first and second object representations based on one or more values of one or more characteristics of the respective first and second surface representation. The computing device can align the first object representation and the second object representation using an alignment of the transformed first object representation and the transformed second object representation. The computing device can provide an output based on the aligned first object representation and second object representation. View details
    Preview abstract An ε-approximate incidence between a point and some geometric object (line, circle, plane, sphere) occurs when the point and the object lie at distance at most ε from each other. Given a set of points and a set of objects, computing the approximate incidences between them is a major step in many database and web-based applications in computer vision and graphics, including robust model fitting, approximate point pattern matching, and estimating the fundamental matrix in epipolar (stereo) geometry. In a typical approximate incidence problem of this sort, we are given a set P of m points in two or three dimensions, a set S of n objects (lines, circles, planes, spheres), and an error parameter ε > 0, and our goal is to report all pairs (p, s) ∈ P × S that lie at distance at most ε from one another. We present efficient output-sensitive approximation algorithms for quite a few cases, including points and lines or circles in the plane, and points and planes, spheres, lines, or circles in three dimensions. Several of these cases arise in the applications mentioned above. Our algorithms report all pairs at distance ≤ ε, but may also report additional pairs, all of which are guaranteed to be at distance at most αε, for some constant α > 1. Our algorithms are based on simple primal and dual grid decompositions and are easy to implement. We note though that (a) the use of duality, which leads to significant improvements in the overhead cost of the algorithms, appears to be novel for this kind of problems; (b) the correct choice of duality in some of these problems is fairly intricate and requires some care; and (c) the correctness and performance analysis of the algorithms (especially in the more advanced versions) is fairly non-trivial. We analyze our algorithms and prove guaranteed upper bounds on their running time and on the “distortion” parameter α. We also briefly describe the motivating applications, and show how they can effectively exploit our solutions. The superior theoretical bounds on the performance of our algorithms, and their simplicity, make them indeed ideal tools for these applications. In a series of preliminary experimentations (not included in this abstract), we substantiate this feeling, and show that our algorithms lead in practice to significant improved performance of the aforementioned applications. View details
    Homotheties and incidences
    Micha Sharir
    Arxiv, vol. https://arxiv.org/abs/1709.02933 (2017)
    Preview abstract We consider problems involving rich homotheties in a set S of n points in the plane (that is, homotheties that map many points of S to other points of S). By reducing these problems to incidence problems involving points and lines in R^3, we are able to obtain refined and new bounds for the number of rich homotheties, and for the number of distinct equivalence classes, under homotheties, of k-element subsets of S, for any k ≥ 3. We also discuss the extensions of these problems to three and higher dimensions. View details
    Preview abstract Methods and systems for rendering a three-dimensional (3D) data model of an object are provided. An example method may include receiving information associated with a first 3D data model and a second 3D data model of an object. The method may also include rendering a first set of images of the first 3D data model, and rendering a second set of images of the second 3D data model. The method may also include comparing respective images of the first set of images to images of the second set of images to determine a plurality of image difference scores between the respective images of the first set of images and the images of the second set of images. The method may also include determining an object difference score based on the determined plurality of image difference scores. View details
    SUPER 4PCS Fast Global Pointcloud Registration via Smart Indexing
    Nicolas Mellado
    Niloy Mitra
    Eurographics Symposium on Geometry Processing 2014
    Preview
    Reporting Neighbors in High-Dimensional Euclidean Space
    Micha Sharir
    SIAM journal of computing, vol. 43 (2014), pp. 1239-1511
    Preview abstract We consider the following problem, which arises in many database and web-based applications: Given a set P of n points in a high-dimensional space Rd and a distance r, we want to report all pairs of points of P at Euclidean distance at most r. We present two randomized algorithms, one based on randomly shifted grids, and the other on randomly shifted and rotated grids. The running time of both algorithms is of the form C(d)(n + k)log n, where k is the output size and C(d) is a constant that depends on the dimension d. The log n factor is needed to guarantee, with high probability, that all neighbor pairs are reported, and can be dropped if it suffices to report, in expectation, an arbitrarily large fraction of the pairs. When only translations are used, C(d) is of the form (a√d)d, for some (small) absolute constant a≈0.484; this bound is worst-case tight, up to an exponential factor of about 2d. When both rotations and translations are used, C(d) can be improved to roughly 6.74d, getting rid of the super-exponential factor √dd. When the input set (lies in a subset of d-space that) has low doubling dimension ,the performance of the first algorithm improves to C(d,δ)(n + k)log n (or to C(d,δ)(n + k)), where C(d,δ)=O((ed/δ),δ), for δ≤ √d. Otherwise, (d,δ)=O(e√d√dδ. We also present experimental results on several large datasets, demonstrating that our algorithms run significantly faster than all the leading existing algorithms for reporting neighbors. View details
    Preview abstract We propose two solutions for both nearest neigh- bors and range search problems. For the nearest neighbors problem, we propose a c-approximate so- lution for the restricted version of the decision prob- lem with bounded radius which is then reduced to the nearest neighbors by a known reduction. For range searching we propose a scheme that learns the parameters in a learning stage adopting them to the case of a set of points with low intrinsic dimension that are embedded in high dimensional space (common scenario for image point descrip- tors). We compare our algorithms to the best known methods for these problems, i.e. LSH, ANN and FLANN. We show analytically and experimentally that we can do better for moderate approximation factor. In contrast to tree structures, our algorithms are trivial to parallelize. In the experiments con- ducted, running on couple of million images, our algorithms show meaningful speed-ups when com- pared with the above mentioned methods. View details
    Efficient model based single and double thresholding for real time recognition
    Silvio Guimarães
    ACCV Workshop on Detection and Tracking in Challenging Environments (2012)
    Preview
    Efficient robust digital hyperplane fitting with bounded error
    Yukiko Kenmochi
    Hugues Talbot
    Lilian Buzer
    Discrete Geometry for Computer Imagery (DGCI) (2011), pp. 223-234
    The Phase Only Transform for unsupervised surface defect detection
    Hugues Talbot
    EMERGING TOPICS IN COMPUTER VISION AND ITS APPLICATIONS, WORLD SCIENTIFIC (2011)
    The Phase Only Transform for unsupervised surface defect detection
    Hugues Talbot
    CVPR (2010), pp. 295-302
    Approximate input sensitive algorithms for point pattern matching
    Klara Kedem
    Pattern Recognition, vol. 43(1) (2010), pp. 153-159
    Geometric pattern matching for point sets in the plane under similarity transformations
    Klara Kedem
    Information Processing Letters, vol. 109(16) (2009), pp. 935-940
    A GPU-Based Algorithm for Approximately Finding the Largest Common Point Set in the Plane under Similarity Transformation
    Klara Kedem
    International Journal of Image and Graphics, vol. 9(2) (2009), pp. 287-298
    Applying graphics hardware to achieve extremely fast geometric pattern matching in two and three dimensional transformation space
    Klara Kedem
    Information Processing Letters, vol. 105(6) (2008), pp. 224-230
    Geodesic Active Contours with Combined Shape and Appearance Priors
    Rami Ben-Ari
    Advanced Concepts for Intelligent Vision Systems (2008), pp. 494-505
    4-points congruent sets for robust pairwise surface registration
    Niloy Mitra
    Daniel Cohen-Or
    ACM Transactions on Graphics, vol. 27(3) (2008)
    Exact and Approximate Geometric Pattern Matching for Point Sets in the Plane under Similarity Transformations
    Klara Kedem
    Canadian Conference on Computational Geometry (2007), pp. 181-187
    Method for inspecting patterns
    Patent (2002)
    Doppler ultrasound simulator
    Nimrod Goor
    Mark Bergman
    Patent (2001)
    Method for locating a model in an image
    Patent (2001)
    Mosaicing Ultrasonic Volumes for Visual Simulation
    Daniel Cohen-Or
    IEEE Computer Graphics and Applications, vol. 20(2) (2000), pp. 53-61
    INTERVENTIONAL RADIOLOGY GUIDANCE SYSTEM
    Sima NADLER
    Daniel COHEN-OR
    Patent
    Real-Time Ultrasound Imaging Simulation
    Daniel Cohen-Or
    Real Time Imaqing, vol. 4(4) (1998), pp. 263-274
    Medical reproduction system
    Mark Bergman
    Dan Levit
    Ron Tepper
    Patent (1997)