Nan Li (李楠) - ACL Anthology

Nan Li

Also published as: 楠李

2025

pdf bib abs
DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness
Jiguo Liu | Chao Liu | Meimei Li | Nan Li | Shihao Gao | Dali Zhu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Rencent advancements in large language models (LLM) have shown impressive versatility across various tasks. Short text matching is one of the fundamental technologies in natural language processing. In previous studies, the common approach to applying them to Chinese is segmenting each sentence into words, and then taking these words as input. However, existing approaches have three limitations: 1) Some Chinese words are polysemous, and semantic information is not fully utilized. 2) Some models suffer potential issues caused by word segmentation and incorrect recognition of negative words affects the semantic understanding of the whole sentence. 3) Fuzzy negation words in ancient Chinese are difficult to recognize and match. In this work, we propose a novel adaptive Transformer for Chinese short text matching using Data Augmentation and Semantic Awareness (DASA), which can fully mine the information expressed in Chinese text to deal with word ambiguity. DASA is based on a Graph Attention Transformer Encoder that takes two word lattice graphs as input and integrates sense information from N-HowNet to moderate word ambiguity. Specially, we use an LLM to generate similar sentences for the optimal text representation. Experimental results show that the augmentation done using DASA can considerably boost the performance of our system and achieve significantly better results than previous state-of-the-art methods on four available datasets, namely MNS, LCQMC, AFQMC, and BQ.

pdf bib abs
Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration
Nan Li | Bo Kang | Tijl De Bie
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Creating robust occupation taxonomies, vital for applications ranging from job recommendation to labor market intelligence, is challenging.Manual curation is slow, while existing automated methods are either not adaptive to dynamic regional markets (top-down) or struggle to build coherent hierarchies from noisy data (bottom-up). We introduce CLIMB (CLusterIng-based Multi-agent taxonomy Builder), a framework that fully automates the creation of high-quality, data-driven taxonomies from raw job postings. CLIMB uses global semantic clustering to distill core occupations, then employs a reflection-based multi-agent system to iteratively build a coherent hierarchy. On three diverse, real-world datasets, we show that CLIMB produces taxonomies that are more coherent and scalable than existing methods and successfully capture unique regional characteristics. We release our code and datasets at https://github.com/aida-ugent/CLIMB.

pdf bib abs
DeMeVa at LeWiDi-2025: Modeling Perspectives with In-Context Learning and Label Distribution Learning
Daniil Ignatev | Nan Li | Hugh Mee Wong | Anh Dang | Shane Kaszefski Yaschuk
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP

This system paper presents the DeMeVa team’s approaches to the third edition of the Learning with Disagreements shared task (LeWiDi 2025; Leonardelli et al., 2025). We explore two directions: in-context learning (ICL) with large language models, where we compare example sampling strategies; and label distribution learning (LDL) methods with RoBERTa (Liu et al., 2019b), where we evaluate several fine-tuning methods. Our contributions are twofold: (1) we show that ICL can effectively predict annotator-specific annotations (perspectivist annotations), and that aggregating these predictions into soft labels yields competitive performance; and (2) we argue that LDL methods are promising for soft label predictions and merit further exploration by the perspectivist community.

pdf bib abs
Human-AI Moral Judgment Congruence on Real-World Scenarios: A Cross-Lingual Analysis
Nan Li | Bo Kang | Tijl De Bie
Proceedings of the 9th Widening NLP Workshop

As Large Language Models (LLMs) are deployed in every aspect of our lives, understanding how they reason about moral issues becomes critical for AI safety. We investigate this using a dataset we curated from Reddit’s r/AmItheAsshole, comprising real-world moral dilemmas with crowd-sourced verdicts. Through experiments on five state-of-the-art LLMs across 847 posts, we find a significant and systematic divergence where LLMs are more lenient than humans. Moreover, we find that translating the posts into another language changes LLMs’ verdicts, indicating their judgments lack cross-lingual stability.

2024

“The Fourth Chinese Spatial Cognition Evaluation Task (SpaCE 2024) presents the first comprehensive Chinese benchmark to assess spatial semantic understanding and reasoning capabilities of Large Language Models (LLMs). It comprises five subtasks in the form of multiple-choice questions: (1) identifying spatial semantic roles; (2) retrieving spatial referents; (3) detecting spatial semantic anomalies; (4) recognizing synonymous spatial expression with different forms; (5) conducting spatial position reasoning. In addition to proposing new tasks, SpaCE 2024 applied a rule-based method to generate high-quality synthetic data with difficulty levels for the reasoning task. 12 teams submitted their models and results, and the top-performing team attained an accuracy of 60.24%, suggesting that there is still significant room for current LLMs to improve, especially in tasks requiring high spatial cognitive processing.”

2023

pdf bib abs
CCL23-Eval任务4总结报告:第三届中文空间语义理解评测(Overview of CCL23-Eval Task 4:The 3rd Chinese Spatial Cognition Evaluation)
Liming Xiao (肖力铭) | Weidong Zhan (詹卫东) | Zhifang Sui (穗志方) | Yuhang Qin (秦宇航) | Chunhui Sun (孙春晖) | Dan Xing (邢丹) | Nan Li (李楠) | Fangwei Zhu (祝方韦) | Peiyi Wang (王培懿)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“第三届中文空间语义理解评测任务(SpaCE2023)旨在测试机器的空间语义理解能力,包括三个子任务:(1)空间信息异常识别任务;(2)空间语义角色标注任务;(3)空间场景异同判断任务。本届评测在SpaCE2022的基础上,优化了子任务一和子任务二的任务设计,并提出了子任务三这一全新的评测任务。最终有1支队伍提交参赛结果,并且在子任务一上的成绩超过了基线模型。本文还报告了大语言模型ChatGPT在SpaCE2023三个子任务上的表现,结合问题提出指令设计可改进的方向。”

Nan Li

Fixing paper assignments

2025

2024

2023

2014

Co-authors

Venues