CuriousBot: Interactive Mobile Exploration
via Actionable 3D Relational Object Graph

1Columbia University, 2Boston Dynamics AI Institute


CuriousBot is a mobile robot that can interactively explore the environment, build a 3D relational object graph,
and exploit the built graph for robotic manipulation tasks on diverse scenes.


CuriousBot can understand various object relations and explore diverse scenes interactively using a handful manipulation skills.

Abstract

Mobile exploration is a longstanding challenge in robotics, yet current methods primarily focus on active perception instead of active interaction, limiting the robot's ability to interact with and fully explore its environment. Existing robotic exploration approaches via active interaction are often restricted to tabletop scenes, neglecting the unique challenges posed by mobile exploration, such as large exploration spaces, complex action spaces, and diverse object relations. In this work, we introduce a 3D relational object graph that encodes diverse object relations and enables exploration through active interaction. We develop a system based on this representation and evaluate it across diverse scenes. Our qualitative and quantitative results demonstrate the system's effectiveness and generalization capabilities, outperforming methods that rely solely on vision-language models (VLMs).



Video

CuriousBot



Method Overview. (a) In the perception pipeline, SLAM processes RGBD observations and odometry estimation from the robot to output camera poses, which are used alongside the RGBD observations to construct an actionable 3D relational object graph. (b) The 3D relational object graph comprises object nodes containing both geometric and semantic information, as well as object edges that encode complex object relations. (c) The serialized object graph is fed into the task planner, and the generated task plans are executed using low-level skills to interactively explore the environment.



Qualitative Videos

Select a task to see the third-person view video and the corresponding 3D relational object graph visualization.