Renewal·마흔의 생활코딩

Open-Fusion

May 3, 2024·1 min read

cover image

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Open-Fusion: builds queryable, open-vocabulary 3D maps in real time. In my view, this is a great step forward toward one of the most important problems in robotics today.
- The robot takes in an RGB-D image stream as it explores.
- It uses SEEM for region-aligned vision-language features to extract open-vocabulary features.
- It integrates the observations into a 3D representation using TSDF (Truncated Signed Distance Function).
- Open-source code, better performance, and accuracy on par with SOTA (ConceptFusion).

Paper

http://arxiv.org/pdf/2310.03923

Abstract

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained Vision-Language Foundation Model (VLFM) for open-set semantic understanding and uses the Truncated Signed Distance Function (TSDF) for rapid 3D scene reconstruction. Leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with the 3D knowledge of the TSDF using an enhanced Hungarian-based feature matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without requiring additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods underline the superiority of Open-Fusion. Moreover, it seamlessly combines the strengths of region-based VLFM and TSDF, enabling real-time 3D scene understanding that includes object concepts and open-world semantics.

Code
https://uark-aicv.github.io/OpenFusion/

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-v

uark-aicv.github.io

This English version was translated by Claude.

Written by

친절한 찰쓰씨

Pleasant Charles — UI/UX researcher at AIT. Keeping notes on design, planning, and slow days here since 2010.

Keep reading

Renewal

Open-Fusion

Paper

Code
https://uark-aicv.github.io/OpenFusion/

Keep reading

Steadily, for the long haul, without burning out

Tech-life balance

Humanality, by Park Jeong-ryeol

Paper

Codehttps://uark-aicv.github.io/OpenFusion/

Keep reading

Steadily, for the long haul, without burning out

Tech-life balance

Humanality, by Park Jeong-ryeol

Code
https://uark-aicv.github.io/OpenFusion/