Simon Lynen
Research Areas
Authored Publications
Sort By
Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization
International Conference on Computer Vision (ICCV'23), IEEE / CVF (2023) (to appear)
Preview abstract
Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code will be released.
View details
Preview abstract
Outlier rejection and equivalently inlier set optimization
is a key ingredient in numerous applications in computer vision such as filtering point-matches in camera pose estimation or plane and normal estimation in point clouds. Several
approaches exist, yet at large scale we face a combinatorial
explosion of possible solutions and state-of-the-art methods
like RANSAC, Hough transform or Branch&Bound require
a minimum inlier ratio or prior knowledge to remain practical. In fact, for problems such as camera posing in very
large scenes these approaches become useless as they have
exponential runtime growth if these conditions aren’t met.
To approach the problem we present a efficient and general algorithm for outlier rejection based on “intersecting”
k-dimensional surfaces in Rd
. We provide a recipe for casting a variety of geometric problems as finding a point in
Rd which maximizes the number of nearby surfaces (and
thus inliers). The resulting algorithm has linear worst-case
complexity with a better runtime dependency in the approximation factor than competing algorithms while not requiring domain specific bounds. This is achieved by introducing a space decomposition scheme that bounds the number of computations by successively rounding and grouping
samples. Our recipe (and open-source code) enables anybody to derive such fast approaches to new problems across
a wide range of domains. We demonstrate the versatility
of the approach on several camera posing problems with a
high number of matches at low inlier ratio achieving stateof-the-art results at significantly lower processing times.
View details
Large-scale, real-time visual-inertial localization revisited
Bernhard Zeisl
Michael Bosse
Joel Hesch
Marc Pollefeys
Roland Siegwart
Torsten Sattler
International Journal of Robotics Research, 39(9) (2019)
Preview abstract
The overreaching goals in image-based localization are larger, better and faster. For the recent years approaches based on local features and sparse 3d point-cloud models have both dominated the benchmarks and seen successful realworld deployment. Recently end-to-end learned localization approaches have been proposed which show promising results on small and medium scale datasets. However the positioning accuracy, latency and compute requirements of these approaches remain an area of work. End-to-end learned approaches also typically require encoding the geometry of the environment in the model, which causes performance problems in large scale scenes and results in a hard to accomodate memory footprint. To deploy localization at world-scale we thus continue to rely on local features and sparse 3d models. We don’t only look at localization though. The goal is to build a scalable and robust end-to-end system including model building, compression, localization and client-side pose fusion for deployment at scale. Our method compresses appearance and geometry of the scene, allows for low-latency localization queries and efficient fusion, leading to scalability beyond what what has been previously demonstrated. In order to further improve efficiency we leverage a combination of priors, nearest neighbor search, geometric match culling and a cascaded pose candidate refinement step. This combination outperforms other approaches when working with large scale models. We demonstrate the effectiveness of our approach on a proof-of-concept system localizing 2.5 million images against models from four cities in different regions on the world achieving query latencies in the 200ms range.
View details