Posts by Collection

people

Sipeng Zheng

Expertise in Hydro-climatology

projects

Climate Change: Impact on agriculture across Southeast Asia

Published: November 01, 2015

Singapore-MIT Alliance project and World Bank project

Climate Change: Impact on urban flooding in Singapore

Published: January 01, 2016

Public Utilities Board (PUB) project

ABC Waters: Low impact development (LID) for Singapore environment

Published: January 01, 2017

PUB-TMSI-Monash University project

Drought Prediction across Australia

Published: February 01, 2018

ARC linkage grant and Department of Industry, NSW, Australia

publications

Relation understanding in videos

Published in Proceedings of the 27th ACM International Conference on Multimedia, 2019

This paper proposes a effective video relation detection model.

Recommended citation: S Zheng, X Chen, S Chen, Q Jin. "Relation understanding in videos." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 2662-2666. https://dl.acm.org/doi/abs/10.1145/3343031.3356080

Visual relation detection with multi-level attention

Published in Proceedings of the 27th ACM International Conference on Multimedia, 2019

This paper proposes a multi-level attention visual relation detection model for visual relation detection.

Recommended citation: S Zheng, S Chen, Q Jin. "Visual relation detection with multi-level attention." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 121-129. https://dl.acm.org/doi/abs/10.1145/3343031.3350962

Skeleton-based interactive graph network for human object interaction detection

Published in IEEE International Conference on Multimedia and Expo, 2020

This paper proposes a skeleton-based interactive graph network for human-object interaction.

Recommended citation: S Zheng, S Chen, Q Jin. "Skeleton-based interactive graph network for human object interaction detection." 2020 IEEE International Conference on Multimedia and Expo (ICME). 1-6. https://ieeexplore.ieee.org/document/9102755

Exploring anchor-based detection for ego4d natural language query

Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop 2022, 2022

This paper proposes an egocentric framework for natural language query.

Recommended citation: S Zheng, Q Zhang, B Liu, Q Jin, J Fu. "Exploring anchor-based detection for ego4d natural language query." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop. https://arxiv.org/abs/2208.05375

VRDFormer: End-to-end video visual relation detection with transformer

Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

This paper proposes the first end2end framework for visual relation detection.

Recommended citation: S Zheng, S Chen, Q Jin. "VRDFormer: End-to-end video visual relation detection with transformer." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18836-18846. https://openaccess.thecvf.com/content/CVPR2022/papers/Zheng_VRDFormer_End-to-End_Video_Visual_Relation_Detection_With_Transformers_CVPR_2022_paper.pdf

Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning

Published in European Conference on Computer Vision, 2022

This paper proposes a hierarchical matching model for few-shot action recognition.

Recommended citation: S Zheng, S Chen, Q Jin. "Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning." 2022 European Conference on Computer Vision. 297-313. https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136640293.pdf

Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos

Published in 2023 IEEE International Conference on Consumer Electronics (ICCE), 2023

This paper proposes a anchor-based detection method for ego-centric natural language localization.

Recommended citation: B Liu, S Zheng, J Fu, WH Cheng. "Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos." 2023 IEEE International Conference on Consumer Electronics (ICCE). 01-04. https://ieeexplore.ieee.org/abstract/document/10043460/

Accommodating audio modality in CLIP for multimodal processing

Published in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

This paper proposes a large pretrained model for vision, text and audio modalities.

Recommended citation: L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin. "Accommodating audio modality in CLIP for multimodal processing." 2023 Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. 7 - 14. https://ojs.aaai.org/index.php/AAAI/article/view/26153/25925

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

Published in arXiv preprint, 2023

This paper proposes an effective no-frill framework for temporal video grounding.

Recommended citation: Q Zhang, S Zheng, Q Jin. "No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection." 2023 arXiv preprint. arXiv:2307.10567. https://arxiv.org/abs/2307.10567

Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework

Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

This paper proposes an open-category pre-trained model for human-object interaction understanding.

Recommended citation: S Zheng, B Xu, Q Jin. "Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework." 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19392-19402. https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf

POV: Prompt-Oriented View-agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World

Published in Proceedings of the 31st ACM International Conference on Multimedia, 2023

This paper propose a prompt-oriented view-agnostic learning framework for multi-view action understanding.

Recommended citation: B Xu, S Zheng, Q Jin. "POV: Prompt-Oriented View-agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World." 2023 Proceedings of the 31st ACM International Conference on Multimedia. 2807-2816. https://doi.org/10.1007/s00704-018-2617-z

SPAFormer: Sequential 3D Part Assembly with Transformers

Published in arXiv preprint, 2024

This paper proposes an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task.

Recommended citation: B Xu, S Zheng, J Qin. "SPAFormer: Sequential 3D Part Assembly with Transformers" 2023 arXiv preprint. arXiv:2310.08922. https://arxiv.org/pdf/2403.05874

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

Published in Proceedings of the International Conference on Learning Representations (ICLR), 2024

This paper proposes the first large multi-modal model for open-world agents in Minecraft.

Recommended citation: S Zheng, J Liu, Y Feng, Z Lu. "Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds." 2023 arXiv preprint. arXiv:2310.13255. https://arxiv.org/abs/2310.13255

UniCode: Learning a Unified Codebook for Multimodal Large Language Models

Published in arXiv preprint, 2024

This paper proposes UniCode, novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals.

Recommended citation: S Zheng, B Zhou, Y Feng, Y Wang, Z Lu. "UniCode: Learning a Unified Codebook for Multimodal Large Language Models" 2024 arXiv preprint. arXiv:2403.09072. https://arxiv.org/abs/2403.09072

EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

Published in arXiv preprint, 2024

This paper introduces a novel asymmetric contrastive objective for EgoHOI named EgoNCE++, while proposing an open-vocabulary benchmark named EgoHOIBench to reveal the diminished performance of current egocentric video-language models (EgoVLM) on fined-grained concepts.

Recommended citation: B Xu, Z Wang, Y Du, Z Song, S Zheng, Q Jin. "EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?" 2024 arXiv preprint. arXiv:2405.17719v1. https://arxiv.org/html/2405.17719v1

LLaMA Rider: Spurring Large Language Models to Explore the Open World

Published in Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024

This paper proposes a self-driven framework for LLM-based agents in open worlds.

Recommended citation: Y Feng, Y Wang, J Liu, S Zheng, Z Lu. "LLaMA Rider: Spurring Large Language Models to Explore the Open World." 2023 arXiv preprint. arXiv:2310.08922. https://arxiv.org/abs/2310.08922

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Published in arXiv preprint, 2024

This paper introduces a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet.

Recommended citation: Wang, Ye and Mei, Yuting and Zheng, Sipeng and Jin, Qin. "QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds" 2024 arXiv preprint. arXiv:2406.16578. https://arxiv.org/abs/2406.16578

software

Predictor identifier: Nonparametric PREDiction (NPRED)

The open-source R package NPRED is used to identify the meaningful predictors to the response from a large set of potential predictors.

WASP: Wavelet System Prediction

The open-source software WASP is used for system modeling and prediction.

WQM: Wavelet-based Quantile Mapping

The open-source software WQM is used for post-processing numerical weather prediction.

synthesis: Generate Synthetic Data from Statistical Models

Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems.

talks

Urban Surface Characteristics Study Using Time-area Function Model: A Case Study in Saudi Arabia

Published: August 24, 2016

Recommended citation: Jiang, Z., Molkenthin, F., & Sieker, H. (2016). Urban Surface Characteristics Study Using Time-area Function Model: A Case Study in Saudi Arabia. 12th International Conference on Hydroinformatics HIC 2016, Poster, Songdo Convensia Convention Center, 21-26 August 2016, FP-10.

teaching

CVEN3501 Water Resources Engineering

Undergraduate course, School of Civil and Environmental Engineering, UNSW Sydney, 2019

Teaching period: T1, 2019
Position: Teaching Assistant
Role: Demonstrator and Tutor
Number of students: 610
Course Profile: Download

CVEN9625 Fundamentals of Water Engineering

Masters course, School of Civil and Environmental Engineering, UNSW Sydney, 2019

Teaching period: S2, 2018; T1&3, 2019.
Position: Teaching Assistant
Role: Demonstrator and Tutor
Number of students: ~200
Course Profile: Download

CVEN9612 Catchment and Water Resources Modelling

Masters course, School of Civil and Environmental Engineering, UNSW Sydney, 2023

Teaching period: T3, 2022 and 2023.
Position: Teaching Assistant
Role: Demonstrator and Instructor
Number of students: ~20
Course Profile: Download