Zongyang Ma
2025
iMOVE : Instance-Motion-Aware Video Understanding
Jiaze Li
|
Yaya Shi
|
Zongyang Ma
|
Haoran Xu
|
Yandong.bai Yandong.bai
|
Huihui Xiao
|
Ruiwen Kang
|
Fan Yang
|
Tingting Gao
|
Di Zhang
Findings of the Association for Computational Linguistics: ACL 2025
Enhancing the fine-grained instance spatiotemporal motion perception capabilities of Video Large Language Models is crucial for improving their temporal and general video understanding. However, current models struggle to perceive detailed and complex instance motions. To address these challenges, we have made improvements from both data and model perspectives. In terms of data, we have meticulously curated iMOVE-IT, the first large-scale instance-motion-aware video instruction-tuning dataset. This dataset is enriched with comprehensive instance motion annotations and spatiotemporal mutual-supervision tasks, providing extensive training for the model’s instance-motion-awareness. Building on this foundation, we introduce iMOVE, an instance-motion-aware video foundation model that utilizes Event-aware Spatiotemporal Efficient Modeling to retain informative instance spatiotemporal motion details while maintaining computational efficiency. It also incorporates Relative Spatiotemporal Position Tokens to ensure awareness of instance spatiotemporal positions. Evaluations indicate that iMOVE excels not only in video temporal understanding and general video understanding but also demonstrates significant advantages in long-term video understanding. We will release the data, code, and model weights after acceptance.
2022
Unsupervised Knowledge Graph Generation Using Semantic Similarity Matching
Lixian Liu
|
Amin Omidvar
|
Zongyang Ma
|
Ameeta Agrawal
|
Aijun An
Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing
Knowledge Graphs (KGs) are directed labeled graphs representing entities and the relationships between them. Most prior work focuses on supervised or semi-supervised approaches which require large amounts of annotated data. While unsupervised approaches do not need labeled training data, most existing methods either generate too many redundant relations or require manual mapping of the extracted relations to a known schema. To address these limitations, we propose an unsupervised method for KG generation that requires neither labeled data nor manual mapping to the predefined relation schema. Instead, our method leverages sentence-level semantic similarity for automatically generating relations between pairs of entities. Our proposed method outperforms two baseline systems when evaluated over four datasets.
Search
Fix author
Co-authors
- Ameeta Agrawal 1
- Aijun An 1
- Tingting Gao 1
- Ruiwen Kang 1
- Jiaze Li 1
- show all...