QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Published in arXiv preprint, 2024

Recommended citation: Wang, Ye and Mei, Yuting and Zheng, Sipeng and Jin, Qin. "QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds" 2024 arXiv preprint. arXiv:2406.16578. https://arxiv.org/abs/2406.16578

While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-making; ii) mastering agile control of locomotion and path planning; iii) developing advanced cognition to execute long-term objectives. QuadrupedGPT processes human command and environmental contexts using a large multimodal model (LMM). Empowered by its extensive knowledge base, our agent autonomously assigns appropriate parameters for adaptive locomotion policies and guides the agent in planning a safe but efficient path towards the goal, utilizing semantic-aware terrain analysis. Moreover, QuadrupedGPT is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals through high-level reasoning. Extensive experiments across various benchmarks confirm that QuadrupedGPT can adeptly handle multiple tasks with intricate instructions, demonstrating a significant step towards the versatile quadruped agents in open-ended worlds.

替代文本
Figure 1: Based upon emerging large foundation models, QuadrupedGPT aims to develop a versatile quadruped agent with the agility of four-legged pets while being able to comprehend intricate human commands and complete them safely and efficiently in open-world environments.