Boxiang Ma (马博翔) - ACL Anthology

Boxiang Ma

Also published as: 博翔马

2025

pdf bib abs
Memorization ≠ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Boxiang Ma | Ru Li | Wang Yuanlong | Hongye Tan | Xiaoli Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Driven by vast and diverse textual data, large language models (LLMs) have demonstrated impressive performance across numerous natural language processing (NLP) tasks. Yet, a critical question persists: does their generalization arise from mere memorization of training data or from deep semantic understanding? To investigate this, we propose a bi-perspective evaluation framework to assess LLMs’ scenario cognition—the ability to link semantic scenario elements with their arguments in context. Specifically, we introduce a novel scenario-based dataset comprising diverse textual descriptions of fictional facts, annotated with scenario elements. LLMs are evaluated through their capacity to answer scenario-related questions (model output perspective) and via probing their internal representations for encoded scenario elements-argument associations (internal representation perspective). Our experiments reveal that current LLMs predominantly rely on superficial memorization, failing to achieve robust semantic scenario cognition, even in simple cases. These findings expose critical limitations in LLMs’ semantic understanding and offer cognitive insights for advancing their capabilities.

2023

pdf bib abs
CCL23-Eval 任务3总结报告:汉语框架语义解析评测(Overview of CCL23-Eval Task 1:Chinese FrameNet Semantic Parsing)
Juncai Li (李俊材) | Zhichao Yan (闫智超) | Xuefeng Su (苏雪峰) | Boxiang Ma (马博翔) | Peiyuan Yang1 (杨沛渊) | Ru Li (李茹)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“汉语框架语义解析评测任务致力于提升机器模型理解细粒度语义信息的能力。该评测数据集包括20000条标注的框架语义解析例句和近700个框架信息。评测任务分为框架识别、论元范围识别和论元角色识别三个子任务,最终成绩根据这三个任务的得分综合计算。本次评测受到工业界和学术界的广泛关注,共有55支队伍报名参赛,其中12支队伍提交了结果,我们选取5支队伍的模型进行结果复现,最终来自四川的李作恒以71.49的分数排名第一。该任务的更多信息,包括系统提交、评测结果以及数据资源,可从CCL-2023汉语框架语义解析评测任务网址1查看。”

Co-authors

Zhichao Yan (闫智超) 1

Peiyuan Yang1 (杨沛渊) 1

Wang Yuanlong 1

Venues

ccl1
emnlp1

Fix author