Wang Jian (王健) - ACL Anthology

Wang Jian

Also published as: 健王

2025

pdf bib abs
dutir914 at SemEval-2025 Task 1: An integrated approach for Multimodal Idiomaticity Representations
Yanan Wang | Dailin Li | Yicen Tian | Bo Zhang | Wang Jian | Liang Yang
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

SemEval-2025 Task 1 introduces multimodal datasets for idiomatic expression representation. Subtask A focuses on ranking images based on potentially idiomatic noun compounds in given sentences. Idiom comprehension demands the fusion of visual and auditory elements with contextual semantics, yet existing datasets exhibit phrase-image discordance and culture-specific opacity, impeding cross-modal semantic alignment. To address these challenges, we propose an integrated approach that combines data augmentation and model fine-tuning in subtask A. First, we construct two idiom datasets by generating visual metaphors for idiomatic expressions to fine-tune the CLIP model. Next, We propose a three-stage multimodal chain-of-thought method, fine-tuning Qwen2.5-VL-7B-Instruct to generate rationales and perform inference, alongside zero-shot experiments with Qwen2.5-VL-72B-Instruct. Finally, we integrate the output of different models through a voting mechanism to enhance the accuracy of multimodal semantic matching. This approach achieves {textbf{0.92}} accuracy on the Portuguese test set and {textbf{0.93}} on the English test set, ranking {textbf{3rd}} and {textbf{4th}}, respectively. The implementation code is publicly available here{footnote{{url{ https://github.com/wyn1015/semeval}}}}.

2024

pdf bib abs
基于本体信息增强的人类表型概念识别(Ontology Information-augmented Human Phenotype Concept Recognition)
Qi Jiewei (祁杰蔚) | Luo Ling (罗凌) | Yang Zhihao (杨志豪) | Wang Jian (王健) | Lin Hongfei (林鸿飞)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“从文本中自动识别人类表型概念对疾病分析具有重大意义。现存本体驱动的表型概念识别方法主要利用本体中概念名和同义词信息,并未充分考虑本体丰富信息。针对此问题,本文提出一种基于本体信息增强的人类表型概念识别方法,利用先进大语言模型进行数据增强,并设计本体向量增强的深度学习模型来提升概念识别性能。在GSC+和ID-68两个数据集上进行实验,结果表明本文提出方法能够利用本体丰富信息有效提升基线模型性能,取得了先进结果。”

Co-authors

Venues

Fix author