Relation understanding in videos
S Zheng, X Chen, S Chen, Q Jin. "Relation understanding in videos." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 2662-2666.
S Zheng, X Chen, S Chen, Q Jin. "Relation understanding in videos." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 2662-2666.
S Zheng, S Chen, Q Jin. "Visual relation detection with multi-level attention." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 121-129.
S Zheng, S Chen, Q Jin. "Skeleton-based interactive graph network for human object interaction detection." 2020 IEEE International Conference on Multimedia and Expo (ICME). 1-6.
S Zheng, Q Zhang, B Liu, Q Jin, J Fu. "Exploring anchor-based detection for ego4d natural language query." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop.
S Zheng, S Chen, Q Jin. "VRDFormer: End-to-end video visual relation detection with transformer." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18836-18846.
S Zheng, S Chen, Q Jin. "Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning." 2022 European Conference on Computer Vision. 297-313.
B Liu, S Zheng, J Fu, WH Cheng. "Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos." 2023 IEEE International Conference on Consumer Electronics (ICCE). 01-04.
L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin. "Accommodating audio modality in CLIP for multimodal processing." 2023 Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. 7 - 14.
Q Zhang, S Zheng, Q Jin. "No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection." 2023 arXiv preprint. arXiv:2307.10567.
S Zheng, B Xu, Q Jin. "Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework." 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19392-19402.
B Xu, S Zheng, Q Jin. "POV: Prompt-Oriented View-agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World." 2023 Proceedings of the 31st ACM International Conference on Multimedia. 2807-2816.
B Xu, S Zheng, J Qin. "SPAFormer: Sequential 3D Part Assembly with Transformers" 2023 arXiv preprint. arXiv:2310.08922.
S Zheng, J Liu, Y Feng, Z Lu. "Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds." 2023 arXiv preprint. arXiv:2310.13255.
S Zheng, B Zhou, Y Feng, Y Wang, Z Lu. "UniCode: Learning a Unified Codebook for Multimodal Large Language Models" 2024 arXiv preprint. arXiv:2403.09072.
B Xu, Z Wang, Y Du, Z Song, S Zheng, Q Jin. "EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?" 2024 arXiv preprint. arXiv:2405.17719v1.
Y Feng, Y Wang, J Liu, S Zheng, Z Lu. "LLaMA Rider: Spurring Large Language Models to Explore the Open World." 2023 arXiv preprint. arXiv:2310.08922.
Wang, Ye and Mei, Yuting and Zheng, Sipeng and Jin, Qin. "QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds" 2024 arXiv preprint. arXiv:2406.16578.
Mandarin(Native), English(Fluent)