Geometry-aware Feature Matching for Large-Scale Structure from Motion

Figure 1. Our proposed method bridges detector-based feature matching with detector-free feature matching. We utilize sparse correspondences as geometric priors to iteratively optimize the matching process. By leveraging the denser matches from detector-free matchers, our method achieves highly accurate camera pose recovery and generates denser point clouds. This approach is particularly effective in challenging large baseline scenarios, such as air-to-ground imagery, and provides substantial benefits to downstream applications like novel view synthesis.

Abstract

Establishing consistent and dense correspondences across multiple images is crucial for Structure from Motion (SfM) systems. Significant view changes, such as air-to-ground with very sparse view overlap, pose an even greater challenge to the correspondence solvers. We present a novel optimization-based approach that significantly enhances existing feature matching methods by introducing geometry cues in addition to color cues. This helps fill gaps when there is less overlap in large-scale scenarios. Our method formulates geometric verification as an optimization problem, guiding feature matching within detector-free methods and using sparse correspondences from detector-based methods as anchor points. By enforcing geometric constraints via the Sampson Distance, our approach ensures that the denser correspondences from detector-free methods are geometrically consistent and more accurate. This hybrid strategy significantly improves correspondence density and accuracy, mitigates multi-view inconsistencies, and leads to notable advancements in camera pose accuracy and point cloud density. It outperforms state-of-the-art feature matching methods on benchmark datasets and enables feature matching in challenging extreme large-scale settings.

Results

IMC 2021 Phototourism and ScanNet Datasets

Figure 2. Qualitative Results. Our method is qualitatively compared with ALIKED + LG on multiple scenes. Green cameras have less than 3◦absolute pose error, while red cameras have an error larger than 3◦. More results can be found in supplementary material.

Table 1. Estimated pose errors on IMC 2021 phototourism in outdoor scences. Results are averaged across all scenes.

Table 2. Estimated pose errors on ScanNet in indoor scences. Results are averaged across all scenes.

Air-to-Ground Dataset

Figure 3. Our method is qualitatively compared with other feature matching methods on large-scale air-to-ground datasets. Red cameras are recovered poses.

Table 3. SfM results with different feature matchers on Air-to-Ground datasets. For methods that generate 2 separate models, we report them as “air model/ground model”.

Method

Figure 4. An overview of our pipeline for SfM reconstruction. 1. the pipeline runs image retrieval based on global embeddings generated by dinov2. 2. A backbone module takes image pairs as input. The image pairs will be processed by a Detector-Free Backbone and a Detector-Based backbone. 3. A geometry-aware optimization module is applied to iteratively optimize the fundamental matrix and matches with anchor points from detector-based methods. 4. The final matched coarse points are refined using a correlation-based refinement block. 5. Final refined matches are then fed into COLMAP for SfM.

Citation

@misc{chen2024geometryawarefeaturematchinglargescale,
      title={Geometry-aware Feature Matching for Large-Scale Structure from Motion}, 
      author={Gonglin Chen and Jinsen Wu and Haiwei Chen and Wenbin Teng and Zhiyuan Gao and Andrew Feng and Rongjun Qin and Yajie Zhao},
      year={2024},
      eprint={2409.02310},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.02310}, 
}

Acknowledgments

Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Busi- ness Center (DOI/IBC) contract number 140D0423C0075. The U.S. Government is authorized to reproduce and dis- tribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.

Geometry-Aware Feature Matching for Large-Scale Structure from Motion

3DV 2025 (Oral)

Gonglin Chen^1,2, Jinsen Wu^1,2, Haiwei Chen^1,2, Wenbin Teng^1,2, Zhiyuan Gao^1,2, Andrew Feng¹, Rongjun Qin³, Yajie Zhao^1,2

Abstract

Results

IMC 2021 Phototourism and ScanNet Datasets

Air-to-Ground Dataset

Method

Citation

Acknowledgments

Geometry-Aware Feature Matching for Large-Scale Structure from Motion

3DV 2025 (Oral)

Gonglin Chen1,2, Jinsen Wu1,2, Haiwei Chen1,2, Wenbin Teng1,2, Zhiyuan Gao1,2, Andrew Feng1, Rongjun Qin3, Yajie Zhao1,2

Abstract

Results

IMC 2021 Phototourism and ScanNet Datasets

Air-to-Ground Dataset

Method

Citation

Acknowledgments

Gonglin Chen^1,2, Jinsen Wu^1,2, Haiwei Chen^1,2, Wenbin Teng^1,2, Zhiyuan Gao^1,2, Andrew Feng¹, Rongjun Qin³, Yajie Zhao^1,2