Sipeng Zheng
Expertise in Hydro-climatology
Expertise in Hydro-climatology
Published:
Singapore-MIT Alliance project and World Bank project
Published:
Public Utilities Board (PUB) project
Published:
PUB-TMSI-Monash University project
Published:
ARC linkage grant and Department of Industry, NSW, Australia
Published in Proceedings of the 27th ACM International Conference on Multimedia, 2019
This paper proposes a effective video relation detection model.
Recommended citation: S Zheng, X Chen, S Chen, Q Jin. "Relation understanding in videos." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 2662-2666. https://dl.acm.org/doi/abs/10.1145/3343031.3356080
Published in Proceedings of the 27th ACM International Conference on Multimedia, 2019
This paper proposes a multi-level attention visual relation detection model for visual relation detection.
Recommended citation: S Zheng, S Chen, Q Jin. "Visual relation detection with multi-level attention." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 121-129. https://dl.acm.org/doi/abs/10.1145/3343031.3350962
Published in IEEE International Conference on Multimedia and Expo, 2020
This paper proposes a skeleton-based interactive graph network for human-object interaction.
Recommended citation: S Zheng, S Chen, Q Jin. "Skeleton-based interactive graph network for human object interaction detection." 2020 IEEE International Conference on Multimedia and Expo (ICME). 1-6. https://ieeexplore.ieee.org/document/9102755
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop 2022, 2022
This paper proposes an egocentric framework for natural language query.
Recommended citation: S Zheng, Q Zhang, B Liu, Q Jin, J Fu. "Exploring anchor-based detection for ego4d natural language query." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop. https://arxiv.org/abs/2208.05375
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
This paper proposes the first end2end framework for visual relation detection.
Recommended citation: S Zheng, S Chen, Q Jin. "VRDFormer: End-to-end video visual relation detection with transformer." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18836-18846. https://openaccess.thecvf.com/content/CVPR2022/papers/Zheng_VRDFormer_End-to-End_Video_Visual_Relation_Detection_With_Transformers_CVPR_2022_paper.pdf
Published in European Conference on Computer Vision, 2022
This paper proposes a hierarchical matching model for few-shot action recognition.
Recommended citation: S Zheng, S Chen, Q Jin. "Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning." 2022 European Conference on Computer Vision. 297-313. https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136640293.pdf
Published in 2023 IEEE International Conference on Consumer Electronics (ICCE), 2023
This paper proposes a anchor-based detection method for ego-centric natural language localization.
Recommended citation: B Liu, S Zheng, J Fu, WH Cheng. "Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos." 2023 IEEE International Conference on Consumer Electronics (ICCE). 01-04. https://ieeexplore.ieee.org/abstract/document/10043460/
Published in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
This paper proposes a large pretrained model for vision, text and audio modalities.
Recommended citation: L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin. "Accommodating audio modality in CLIP for multimodal processing." 2023 Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. 7 - 14. https://ojs.aaai.org/index.php/AAAI/article/view/26153/25925
Published in arXiv preprint, 2023
This paper proposes an effective no-frill framework for temporal video grounding.
Recommended citation: Q Zhang, S Zheng, Q Jin. "No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection." 2023 arXiv preprint. arXiv:2307.10567. https://arxiv.org/abs/2307.10567
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
This paper proposes an open-category pre-trained model for human-object interaction understanding.
Recommended citation: S Zheng, B Xu, Q Jin. "Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework." 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19392-19402. https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf
Published in Proceedings of the 31st ACM International Conference on Multimedia, 2023
This paper propose a prompt-oriented view-agnostic learning framework for multi-view action understanding.
Recommended citation: B Xu, S Zheng, Q Jin. "POV: Prompt-Oriented View-agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World." 2023 Proceedings of the 31st ACM International Conference on Multimedia. 2807-2816. https://doi.org/10.1007/s00704-018-2617-z
Published in arXiv preprint, 2024
This paper proposes an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task.
Recommended citation: B Xu, S Zheng, J Qin. "SPAFormer: Sequential 3D Part Assembly with Transformers" 2023 arXiv preprint. arXiv:2310.08922. https://arxiv.org/pdf/2403.05874
Published in Proceedings of the International Conference on Learning Representations (ICLR), 2024
This paper proposes the first large multi-modal model for open-world agents in Minecraft.
Recommended citation: S Zheng, J Liu, Y Feng, Z Lu. "Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds." 2023 arXiv preprint. arXiv:2310.13255. https://arxiv.org/abs/2310.13255
Published in arXiv preprint, 2024
This paper proposes UniCode, novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals.
Recommended citation: S Zheng, B Zhou, Y Feng, Y Wang, Z Lu. "UniCode: Learning a Unified Codebook for Multimodal Large Language Models" 2024 arXiv preprint. arXiv:2403.09072. https://arxiv.org/abs/2403.09072
Published in arXiv preprint, 2024
This paper introduces a novel asymmetric contrastive objective for EgoHOI named EgoNCE++, while proposing an open-vocabulary benchmark named EgoHOIBench to reveal the diminished performance of current egocentric video-language models (EgoVLM) on fined-grained concepts.
Recommended citation: B Xu, Z Wang, Y Du, Z Song, S Zheng, Q Jin. "EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?" 2024 arXiv preprint. arXiv:2405.17719v1. https://arxiv.org/html/2405.17719v1
Published in Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
This paper proposes a self-driven framework for LLM-based agents in open worlds.
Recommended citation: Y Feng, Y Wang, J Liu, S Zheng, Z Lu. "LLaMA Rider: Spurring Large Language Models to Explore the Open World." 2023 arXiv preprint. arXiv:2310.08922. https://arxiv.org/abs/2310.08922
Published in arXiv preprint, 2024
This paper introduces a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet.
Recommended citation: Wang, Ye and Mei, Yuting and Zheng, Sipeng and Jin, Qin. "QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds" 2024 arXiv preprint. arXiv:2406.16578. https://arxiv.org/abs/2406.16578
The open-source R package NPRED is used to identify the meaningful predictors to the response from a large set of potential predictors.
The open-source software WASP is used for system modeling and prediction.
The open-source software WQM is used for post-processing numerical weather prediction.
Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems.
Published:
Recommended citation: Jiang, Z., Molkenthin, F., & Sieker, H. (2016). Urban Surface Characteristics Study Using Time-area Function Model: A Case Study in Saudi Arabia. 12th International Conference on Hydroinformatics HIC 2016, Poster, Songdo Convensia Convention Center, 21-26 August 2016, FP-10.
Undergraduate course, School of Civil and Environmental Engineering, UNSW Sydney, 2019
Masters course, School of Civil and Environmental Engineering, UNSW Sydney, 2019
Masters course, School of Civil and Environmental Engineering, UNSW Sydney, 2023