MIT Marine Robotics Group

LOSS-SLAM: Lightweight Open-Set Semantic
Simultaneous Localization and Mapping

MIT CSAIL
{singhk,magoun,jleonard}@mit.edu

News

event [October 2024] Paper received Best Paper Award: Runner Up at IROS 2024 Workshop: Standing the Test of Time: Retrospective and Future of World Representations for Lifelong Robotics
event [October 2024] Paper accepted to IROS 2024 Workshop: Standing the Test of Time: Retrospective and Future of World Representations for Lifelong Robotics
event [April 2024] Paper uploaded to arXiv!
event [February 2024] Project page released!
event [February 2024] Data released!

Abstract

Enabling robots to understand the world in terms of objects is a critical building block towards higher level autonomy. The success of foundation models in vision has created the ability to segment and identify nearly all objects in the world. However, utilizing such objects to localize the robot and build an open-set semantic map of the world remains an open research question. In this work, a system of identi- fying, localizing, and encoding objects is tightly coupled with probabilistic graphical models for performing open-set semantic simultaneous localization and mapping (SLAM). Results are presented demonstrating that the proposed lightweight object encoding can be used to perform more accurate object-based SLAM than existing open-set methods, closed-set methods, and geometric methods while incurring smaller computational overhead than existing open-set mapping methods.




Method


An overview of the proposed open-set data association system coupled with a factor graph framework when a new image and odometry pair is received. The image is fed into the DINO network to get patch- level encodings, which are then clustered into objects. Those clusters are determined to be either foreground or background based on the attention heads. A connected component analysis yields instance level segmentations, from which for each object, a single encoding vector is used as the object representation. The encoding is compared against the existing landmarks’ encodings to determine class matches. The pose of the object is also compared against the existing objects’ poses as the final data association filter (not pictured). After building a factor with all matches that pass the filter (depending on backend method, either expectation-maximization, max- mixtures, or max-likelihood factor), we add a new pose (light blue) to the factor graph with a factor connecting it (light pink) to the previous landmark.

Paper


LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping

Kurran Singh, Tim Magoun, John Leonard

description arXiv version
insert_comment BibTeX
integration_instructions Code

Citation


@inproceedings{singh2024opensetslam,
    title=LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping,
    author={Kurran Singh and Tim Magoun and John Leonard},
    booktitle={arxiv Preprint},
    year={2024}
}
This webpage template was recycled from here.