Welcome

Partner at BeingBeyond | Lead, Embodied Multimodal Pretraining

I’m a partner at BeingBeyond, where I work closely with Prof. Zongqing Lu on foundation models for general-purpose humanoid robots. I lead projects across the Being-H, Being-M, and Being-VL series, with a focus on embodied foundation models, multimodal pretraining, egocentric human video learning, and humanoid robotics.

Previously, I was a researcher at the Beijing Academy of Artificial Intelligence (BAAI). I received my PhD and bachelor’s degree from Renmin University of China (RUC), advised by Prof. Qin Jin.

My research interests include Embodied Foundation Models, Large Multimodal Models, Egocentric Human Video Learning, and Humanoid Robotics.

Join Us!

We are actively recruiting full-time researchers and interns to join our team. If you’re passionate about embodied AI, feel free to reach out.

News

  • 2026.05: Being-H0 is accepted to ICML 2026.
  • 2026.04: We present Being-H0.7, a latent world-action model based on 200K human videos.
  • 2026.04: Being-H0.5 has been open-sourced.
  • 2026.02: Three papers are accepted to CVPR 2026.
  • 2026.01: We release Being-H0.5, our latest cross-embodiment foundation VLA.
Earlier news
  • 2025.09: Two papers are accepted to NeurIPS 2025.
  • 2025.08: We present our Being-M0.5, an improved version of its Being-M0 with real-time controllability.
  • 2025.07: Our next version of LMM Being-VL-0.5 is released, including code and checkpoints.
  • 2025.07: We release Being-H0, the first VLA pretrained from large-scale human videos with hand motion!
  • 2025.06: Three papers are accepted to ICCV 2025.
  • 2025.05: We present our first million-level motion model Being-M0, which is accepted by ICML 2025.

Publications

* denotes equal contribution

beingbeyond BeingBeyond Series

Series
Being-H series thumbnail

Being-H Series VLA

Vision-language-action pretraining from large-scale human videos.

BeingBeyond Team

The first VLA pretrained with 10K hours of human videos, spanning 30+ robot embodiments.

Previous Being-H0 · ICML 2026 · BeingBeyond Team · first VLA pretrained with 1K hours of human videos · Blog · Paper · Code · Model

Series
Being-M series thumbnail

Being-M Series

Large motion generation models for controllable vision-language-motion generation.

Bin Cao*, Sipeng Zheng*, Ye Wang, Lujie Xia, Qianshan Wei, Qin Jin, Jing Liu, Zongqing Lu

Previous Being-M0 · ICML 2025 · Ye Wang*, Sipeng Zheng*, et al. · Paper · Page

Series
Being-VL series thumbnail

Being-VL Series

Discrete visual tokenization for unified multimodal understanding.

Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu

Previous Being-VL-0 · ICLR 2025 · Wanpeng Zhang, Sipeng Zheng, et al. · Paper · Page

Early Work

ICLR 2024
paper thumbnail

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds, Sipeng Zheng, Jiazheng Liu, Yicheng Feng, Zongqing Lu

ICLR24 (Spotlight 5.02%)

Project | Code

CVPR 2022
paper thumbnail

VRDFormer: End-to-end video visual relation detection with transformer, Sipeng Zheng, Shizhe Chen, Qin Jin

CVPR22 (Oral 4.14%)

Code

Paper List

Honors and Awards

  • 2025 Ranked 1st in GemBench Challenge at CVPR 2025 Workshop GRAIL.
  • 2022 Ranked 3rd in CVPR 2022 Ego4D Natural Language Query Challenge.
  • 2021 Ranked 3rd in NIST TRECVID 2021 Ad-hoc Video Search (AVS) Challenge.
  • 2021 Ranked 2nd in CVPR 2021 HOMAGE Scene-graph Generation Challenge.
  • 2020 Ranked 2nd in ACM MM 2020 Video Relationship Understanding Grand Challenge.
  • 2019 Ranked 2nd in ACM MM 2019 Video Relationship Understanding Grand Challenge.
  • 2022 National Scholarship for Ph.D. Students.
  • 2019 Best Method Prize in ACM MM 2019 Grand Challenge.
  • 2019 First Class Scholarship for Ph.D. Students from 2018 to 2021.
  • 2015 First Prize in National University Mathematical Modeling Competition of Beijing Area.

Education

  • 2018.09 - 2023.06, PhD, Computer Science and Engineering, Renmin University of China, China.
  • 2014.09 - 2018.06, Undergraduate, Computer Science and Engineering, Renmin University of China, China.

Work Experience

  • 2025.05 - now, Research Scientist, BeingBeyond, Beijing, China.
  • 2023.07 - 2025.05, Researcher, Beijing Academy of Artificial Intelligence, Beijing, China.
  • 2022.04 - 2022.10, Research Intern, Microsoft Research Asia, Beijing, China.
  • 2021.11 - 2022.04, Research Intern, Beijing Academy of Artificial Intelligence, Beijing, China.

Services

  • Conference Reviewer for CVPR, ICCV, ECCV, ACCV, NeurIPS, AAAI, ACM MM.
  • Journal Reviewer for IJCV, TCSVT, TMM, JATS.