3D-MOOD

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

ICCV 2025

Yung-Hsu Yang1      Luigi Piccinelli1      Mattia Segu1      Siyuan Li1      Rui Huang1,2     
Yuqian Fu3      Marc Pollefeys1,4      Hermann Blum1,5      Zuria Bauer1

1ETH Zurich     2Tsinghua University     3INSAIT     4Microsoft     5University of Bonn

Abstract

Monocular 3D object detection is valuable for various applications such as robotics and AR/VR. Existing methods are confined to closed-set settings, where the training and testing sets consist of the same scenes and/or object categories. However, real-world applications often introduce new environments and novel object categories, posing a challenge to these methods. In this paper, we address monocular 3D object detection in an open-set setting and introduce the first end-to-end 3D Monocular Openset Object Detector (3D-MOOD). We propose to lift the open-set 2D detection into 3D space through our designed 3D bounding box head, enabling end-to-end joint training for both 2D and 3D tasks to yield better overall performance. We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes. To further improve performance, we design the canonical image space for more efficient cross-dataset training. We evaluate 3D-MOOD on both closed-set settings (Omni3D) and open-set settings (Omni3D → Argoverse 2, ScanNet), and achieve new state-of-the-art results.

Method overview


Summary

  • We explore monocular 3D object detection in open-set settings, establishing benchmarks that account for both novel scenes and unseen object categories.
  • We introduce 3D-MOOD, the first end-to-end open-set monocular 3D object detector, via 2D to 3D lifting, geometry-aware 3D query generation, and canonical image space.
  • We achieve state-of-the-art performance in both closed-set and open-set settings, demonstrating the effectiveness of our method and the feasibility of open-set monocular 3D object detection.

Qualitative Results in 3D



Qualitative Results in 2D



BibTeX

@article{yang20253d,
      title={3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection},
      author={Yang, Yung-Hsu and Piccinelli, Luigi and Segu, Mattia and Li, Siyuan and Huang, Rui and Fu, Yuqian and Pollefeys, Marc and Blum, Hermann and Bauer, Zuria},
      journal={arXiv preprint arXiv:2507.23567},
      year={2025}
    }

Awesome webpage template