📝 Publications

* denotes equal contribution, † denotes project lead, ✉ denotes corresponding author

beingbeyond BeingBeyond Series

arxiv
sym

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization
Hao Luo*, Ye Wang*, Wanpeng Zhang*, Sipeng Zheng*, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing Lu

Blog | Code | huggingface Model

  • Robots do not just look different. They also act through different physical control languages: different kinematics, sensors, action conventions, and timing. Being-H0.5 is our attempt to make one Vision-Language-Action model travel across those differences without turning into a brittle collection of per-robot hacks. The model is trained using over 35,000 hours of data, including 16,000 hours of human videos and 14,000 hours of robot manipualtion (30+ embodiments).
arxiv
sym

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
Hao Luo*, Yicheng Feng*, Wanpeng Zhang*, Sipeng Zheng*, Ye Wang, Haoqi Yuan, Jiazheng Liu, Chaoyi Xu, Qin Jin, Zongqing Lu

Blog | Code | huggingface Model

  • Being-H0 is the first VLA pretrained from large-scale human videos with hand motion.
ICCV 2025
sym

Being-M0.5: A Real-Time Controllable Vision-Language-Motion Model
Bin Cao*, Sipeng Zheng*, Ye Wang, Lujie Xia, Qianshan Wei, Qin Jin, Jing Liu, Zongqing Lu

ICCV25

Blog | Page

  • Being-M is the first large motion generation model scaling to million-level motion sequences.

Being-M0: Scaling Large Motion Models with Million-Level Human Motions (ICML 2025)

ICCV 2025
sym

Being-VL0.5: Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu

ICCV25 (Highlight)

Blog | Code | Page

  • Being-VL is the first large multimodal model based on compressed discrete visual representation using 2D-BPE.

Being-VL-0: From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities (ICLR 2025)

🎙 Before BeingBeyond

ICLR 2024
sym

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds, Sipeng Zheng, Jiazheng Liu, Yicheng Feng, Zongqing Lu

ICLR24 (Spotlight 5.02%)

Project | Code

ECCV 2022
sym
CVPR 2022
sym

VRDFormer: End-to-end video visual relation detection with transformer, Sipeng Zheng, Shizhe Chen, Qin Jin

CVPR22 (Oral 4.14%)

Code

📚 Paper List