Enabling robots to understand the world in terms of objects is a critical building block towards higher level autonomy. The success of foundation models in vision has created the ability to segment and identify nearly all objects in the world. However, utilizing such objects to localize the robot and build an open-set semantic map of the world remains an open research question. In this work, a system of identi- fying, localizing, and encoding objects is tightly coupled with probabilistic graphical models for performing open-set semantic simultaneous localization and mapping (SLAM). Results are presented demonstrating that the proposed lightweight object encoding can be used to perform more accurate object-based SLAM than existing open-set methods, closed-set methods, and geometric methods while incurring smaller computational overhead than existing open-set mapping methods.
An overview of the proposed open-set data association system
coupled with a factor graph framework when a new image and odometry
pair is received. The image is fed into the DINO network to get patch-
level encodings, which are then clustered into objects. Those clusters are
determined to be either foreground or background based on the attention
heads. A connected component analysis yields instance level segmentations,
from which for each object, a single encoding vector is used as the object
representation. The encoding is compared against the existing landmarks’
encodings to determine class matches. The pose of the object is also
compared against the existing objects’ poses as the final data association
filter (not pictured). After building a factor with all matches that pass the
filter (depending on backend method, either expectation-maximization, max-
mixtures, or max-likelihood factor), we add a new pose (light blue) to the
factor graph with a factor connecting it (light pink) to the previous landmark.
LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping
Kurran Singh, Tim Magoun, John Leonard
@inproceedings{singh2024opensetslam,
title=LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping,
author={Kurran Singh and Tim Magoun and John Leonard},
booktitle={arxiv Preprint},
year={2024}
}