Back to feed
Renewal·마흔의 생활코딩

Open-Fusion

NS
normalstory
cover image

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation


Open-Fusion: builds queryable, open-vocabulary 3D maps in real time. In my view, this is a great step forward toward one of the most important problems in robotics today.
- The robot takes in an RGB-D image stream as it explores.
- It uses SEEM for region-aligned vision-language features to extract open-vocabulary features.
- It integrates the observations into a 3D representation using TSDF (Truncated Signed Distance Function).
- Open-source code, better performance, and accuracy on par with SOTA (ConceptFusion).





Paper


http://arxiv.org/pdf/2310.03923

Abstract

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained Vision-Language Foundation Model (VLFM) for open-set semantic understanding and uses the Truncated Signed Distance Function (TSDF) for rapid 3D scene reconstruction. Leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with the 3D knowledge of the TSDF using an enhanced Hungarian-based feature matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without requiring additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods underline the superiority of Open-Fusion. Moreover, it seamlessly combines the strengths of region-based VLFM and TSDF, enabling real-time 3D scene understanding that includes object concepts and open-world semantics.




Code
https://uark-aicv.github.io/OpenFusion/

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-v

uark-aicv.github.io

This English version was translated by Claude.

친절한 찰쓰씨
Written by
친절한 찰쓰씨

Pleasant Charles — UI/UX researcher at AIT. Keeping notes on design, planning, and slow days here since 2010.

More on the author's page

Keep reading

Renewal

Steadily, for the long haul, without burning out

Mar 31, 2026·9 min
Renewal

Tech-life balance

Feb 7, 2026·3 min
Renewal

Humanality, by Park Jeong-ryeol

Feb 7, 2026·11 min