ExpLay: A new Corpus Resource for the Research on Expertise as an Influential Factor on Language Production

Carmen Schacht, Renate Delucchi Danhier


Abstract
This paper introduces the ExpLay-Pipeline, a novel semi-automated processing tool designed for the analysis of language production data from experts in comparison to the language production of a control group of laypeople. The pipeline combines manual annotation and curation with state-of-the-art machine learning and rule-based methods, following a silver standard approach. It integrates various analysis modules specifically for the syntactic and lexical evaluation of parsed linguistic data. While implemented initially for the creation of the ExpLay-Corpus, it is designed for the processing of linguistic data in general. The paper details the design and implementation of this pipeline.
Anthology ID:
2025.law-1.17
Volume:
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Siyao Peng, Ines Rehbein
Venues:
LAW | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
216–227
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.law-1.17/
DOI:
Bibkey:
Cite (ACL):
Carmen Schacht and Renate Delucchi Danhier. 2025. ExpLay: A new Corpus Resource for the Research on Expertise as an Influential Factor on Language Production. In Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), pages 216–227, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
ExpLay: A new Corpus Resource for the Research on Expertise as an Influential Factor on Language Production (Schacht & Delucchi Danhier, LAW 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.law-1.17.pdf