Data-driven Parsing Evaluation for Child-Parent Interactions

Zoey Liu, Emily Prud’hommeaux


Abstract
We present a syntactic dependency treebank for naturalistic child and child-directed spoken English. Our annotations largely follow the guidelines of the Universal Dependencies project (UD [Zeman et al., 2022]), with detailed extensions to lexical and syntactic structures unique to spontaneous spoken language, as opposed to written texts or prepared speech. Compared to existing UD-style spoken treebanks and other dependency corpora of child-parent interactions specifically, our dataset is much larger (44,744 utterances; 233,907 words) and contains data from 10 children covering a wide age range (18–66 months). We conduct thorough dependency parser evaluations using both graph-based and transition-based parsers, trained on three different types of out-of-domain written texts: news, tweets, and learner data. Out-of-domain parsers demonstrate reasonable performance for both child and parent data. In addition, parser performance for child data increases along children’s developmental paths, especially between 18 and 48 months, and gradually approaches the performance for parent data. These results are further validated with in-domain training.
Anthology ID:
2023.tacl-1.97
Volume:
Transactions of the Association for Computational Linguistics, Volume 11
Month:
Year:
2023
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1734–1753
Language:
URL:
https://aclanthology.org/2023.tacl-1.97
DOI:
10.1162/tacl_a_00624
Bibkey:
Cite (ACL):
Zoey Liu and Emily Prud’hommeaux. 2023. Data-driven Parsing Evaluation for Child-Parent Interactions. Transactions of the Association for Computational Linguistics, 11:1734–1753.
Cite (Informal):
Data-driven Parsing Evaluation for Child-Parent Interactions (Liu & Prud’hommeaux, TACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2024-07/2023.tacl-1.97.pdf