Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
The wavelet-based variance transformation method is used for system modelling and prediction. It refines predictor spectral representation using Wavelet Theory, which leads to improved model specifications and prediction accuracy. A supporting open-source software, Wavelet System Prediction (WASP), can be found under page of Software.
Published:
GitHub Pages is a static site hosting service that takes HTML, CSS, and JavaScript files straight from a repository on GitHub, optionally runs the files through a build process, and publishes a website.
Published:
A word cloud is a collection or cluster of words described in different sizes. The larger the word appears, the more times it is mentioned in a given text, and the more important it is.
Published:
LaTeX is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents, but it can be used for almost any form of publishing.
Expertise in Hydro-climatology
Published:
Singapore-MIT Alliance project and World Bank project
Published:
Public Utilities Board (PUB) project
Published:
PUB-TMSI-Monash University project
Published:
ARC linkage grant and Department of Industry, NSW, Australia
Published in Proceedings of the 27th ACM International Conference on Multimedia, 2019
This paper proposes a effective video relation detection model.
Recommended citation: S Zheng, X Chen, S Chen, Q Jin. "Relation understanding in videos." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 2662-2666. https://dl.acm.org/doi/abs/10.1145/3343031.3356080
Published in Proceedings of the 27th ACM International Conference on Multimedia, 2019
This paper proposes a multi-level attention visual relation detection model for visual relation detection.
Recommended citation: S Zheng, S Chen, Q Jin. "Visual relation detection with multi-level attention." 2019 Proceedings of the 27th ACM International Conference on Multimedia. 121-129. https://dl.acm.org/doi/abs/10.1145/3343031.3350962
Published in IEEE International Conference on Multimedia and Expo, 2020
This paper proposes a skeleton-based interactive graph network for human-object interaction.
Recommended citation: S Zheng, S Chen, Q Jin. "Skeleton-based interactive graph network for human object interaction detection." 2020 IEEE International Conference on Multimedia and Expo (ICME). 1-6. https://ieeexplore.ieee.org/document/9102755
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop 2022, 2022
This paper proposes an egocentric framework for natural language query.
Recommended citation: S Zheng, Q Zhang, B Liu, Q Jin, J Fu. "Exploring anchor-based detection for ego4d natural language query." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop. https://arxiv.org/abs/2208.05375
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
This paper proposes the first end2end framework for visual relation detection.
Recommended citation: S Zheng, S Chen, Q Jin. "VRDFormer: End-to-end video visual relation detection with transformer." 2022 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18836-18846. https://openaccess.thecvf.com/content/CVPR2022/papers/Zheng_VRDFormer_End-to-End_Video_Visual_Relation_Detection_With_Transformers_CVPR_2022_paper.pdf
Published in European Conference on Computer Vision, 2022
This paper proposes a hierarchical matching model for few-shot action recognition.
Recommended citation: S Zheng, S Chen, Q Jin. "Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning." 2022 European Conference on Computer Vision. 297-313. https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136640293.pdf
Published in 2023 IEEE International Conference on Consumer Electronics (ICCE), 2023
This paper proposes a anchor-based detection method for ego-centric natural language localization.
Recommended citation: B Liu, S Zheng, J Fu, WH Cheng. "Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos." 2023 IEEE International Conference on Consumer Electronics (ICCE). 01-04. https://ieeexplore.ieee.org/abstract/document/10043460/
Published in Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
This paper proposes a large pretrained model for vision, text and audio modalities.
Recommended citation: L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin. "Accommodating audio modality in CLIP for multimodal processing." 2023 Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. 7 - 14. https://ojs.aaai.org/index.php/AAAI/article/view/26153/25925
Published in arXiv preprint, 2023
This paper proposes an effective no-frill framework for temporal video grounding.
Recommended citation: Q Zhang, S Zheng, Q Jin. "No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection." 2023 arXiv preprint. arXiv:2307.10567. https://arxiv.org/abs/2307.10567
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
This paper proposes an open-category pre-trained model for human-object interaction understanding.
Recommended citation: S Zheng, B Xu, Q Jin. "Open-Category Human-Object Interaction Pre-Training via Language Modeling Framework." 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19392-19402. https://openaccess.thecvf.com/content/CVPR2023/papers/Zheng_Open-Category_Human-Object_Interaction_Pre-Training_via_Language_Modeling_Framework_CVPR_2023_paper.pdf
Published in Proceedings of the 31st ACM International Conference on Multimedia, 2023
This paper propose a prompt-oriented view-agnostic learning framework for multi-view action understanding.
Recommended citation: B Xu, S Zheng, Q Jin. "POV: Prompt-Oriented View-agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World." 2023 Proceedings of the 31st ACM International Conference on Multimedia. 2807-2816. https://doi.org/10.1007/s00704-018-2617-z
Published in arXiv preprint, 2024
This paper proposes an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task.
Recommended citation: B Xu, S Zheng, J Qin. "SPAFormer: Sequential 3D Part Assembly with Transformers" 2023 arXiv preprint. arXiv:2310.08922. https://arxiv.org/pdf/2403.05874
Published in Proceedings of the International Conference on Learning Representations (ICLR), 2024
This paper proposes the first large multi-modal model for open-world agents in Minecraft.
Recommended citation: S Zheng, J Liu, Y Feng, Z Lu. "Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds." 2023 arXiv preprint. arXiv:2310.13255. https://arxiv.org/abs/2310.13255
Published in arXiv preprint, 2024
This paper proposes UniCode, novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals.
Recommended citation: S Zheng, B Zhou, Y Feng, Y Wang, Z Lu. "UniCode: Learning a Unified Codebook for Multimodal Large Language Models" 2024 arXiv preprint. arXiv:2403.09072. https://arxiv.org/abs/2403.09072
Published in arXiv preprint, 2024
This paper introduces a novel asymmetric contrastive objective for EgoHOI named EgoNCE++, while proposing an open-vocabulary benchmark named EgoHOIBench to reveal the diminished performance of current egocentric video-language models (EgoVLM) on fined-grained concepts.
Recommended citation: B Xu, Z Wang, Y Du, Z Song, S Zheng, Q Jin. "EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?" 2024 arXiv preprint. arXiv:2405.17719v1. https://arxiv.org/html/2405.17719v1
Published in Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
This paper proposes a self-driven framework for LLM-based agents in open worlds.
Recommended citation: Y Feng, Y Wang, J Liu, S Zheng, Z Lu. "LLaMA Rider: Spurring Large Language Models to Explore the Open World." 2023 arXiv preprint. arXiv:2310.08922. https://arxiv.org/abs/2310.08922
Published in arXiv preprint, 2024
This paper introduces a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet.
Recommended citation: Wang, Ye and Mei, Yuting and Zheng, Sipeng and Jin, Qin. "QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds" 2024 arXiv preprint. arXiv:2406.16578. https://arxiv.org/abs/2406.16578
The open-source R package NPRED is used to identify the meaningful predictors to the response from a large set of potential predictors.
The open-source software WASP is used for system modeling and prediction.
The open-source software WQM is used for post-processing numerical weather prediction.
Generate synthetic time series from commonly used statistical models, including linear, nonlinear and chaotic systems.
Published:
Recommended citation: Jiang, Z., Molkenthin, F., & Sieker, H. (2016). Urban Surface Characteristics Study Using Time-area Function Model: A Case Study in Saudi Arabia. 12th International Conference on Hydroinformatics HIC 2016, Poster, Songdo Convensia Convention Center, 21-26 August 2016, FP-10.
Undergraduate course, School of Civil and Environmental Engineering, UNSW Sydney, 2019
Masters course, School of Civil and Environmental Engineering, UNSW Sydney, 2019
Masters course, School of Civil and Environmental Engineering, UNSW Sydney, 2023