Chunhui Sun

Also published as: 春晖


2024

“The Fourth Chinese Spatial Cognition Evaluation Task (SpaCE 2024) presents the first comprehensive Chinese benchmark to assess spatial semantic understanding and reasoning capabilities of Large Language Models (LLMs). It comprises five subtasks in the form of multiple-choice questions: (1) identifying spatial semantic roles; (2) retrieving spatial referents; (3) detecting spatial semantic anomalies; (4) recognizing synonymous spatial expression with different forms; (5) conducting spatial position reasoning. In addition to proposing new tasks, SpaCE 2024 applied a rule-based method to generate high-quality synthetic data with difficulty levels for the reasoning task. 12 teams submitted their models and results, and the top-performing team attained an accuracy of 60.24%, suggesting that there is still significant room for current LLMs to improve, especially in tasks requiring high spatial cognitive processing.”

2023

“第三届中文空间语义理解评测任务(SpaCE2023)旨在测试机器的空间语义理解能力,包括三个子任务:(1)空间信息异常识别任务;(2)空间语义角色标注任务;(3)空间场景异同判断任务。本届评测在SpaCE2022的基础上,优化了子任务一和子任务二的任务设计,并提出了子任务三这一全新的评测任务。最终有1支队伍提交参赛结果,并且在子任务一上的成绩超过了基线模型。本文还报告了大语言模型ChatGPT在SpaCE2023三个子任务上的表现,结合问题提出指令设计可改进的方向。”