Automatic Evaluation of Linguistic Validity in Japanese CCG Treebanks

Asa Tomita, Hitomi Yanaka, Daisuke Bekki


Abstract
In natural language inference, the accuracy of systems based on compositional semantics depends on the quality of syntactic analysis, which in turn relies on linguistically valid training and evaluation data, typically provided by treebanks. However, conventional treebank evaluation metrics focus on data coverage and fail to assess the linguistic validity of syntactic structures. This paper proposes novel evaluation methods to enable automatic and multifaceted assessment of linguistic validity. We apply these methods to a Japanese treebank based on combinatory categorial grammar and report the evaluation results.
Anthology ID:
2025.tlt-1.9
Volume:
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Sarah Jablotschkin, Sandra Kübler, Heike Zinsmeister
Venues:
TLT | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–80
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.9/
DOI:
Bibkey:
Cite (ACL):
Asa Tomita, Hitomi Yanaka, and Daisuke Bekki. 2025. Automatic Evaluation of Linguistic Validity in Japanese CCG Treebanks. In Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025), pages 74–80, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
Automatic Evaluation of Linguistic Validity in Japanese CCG Treebanks (Tomita et al., TLT-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.tlt-1.9.pdf