Investigating the content and form of referring expressions in Mandarin: introducing the Mtuna corpus

Kees van Deemter, Le Sun, Rint Sybesma, Xiao Li, Bo Chen, Muyun Yang


Abstract
East Asian languages are thought to handle reference differently from languages such as English, particularly in terms of the marking of definiteness and number. We present the first Data-Text corpus for Referring Expressions in Mandarin, and we use this corpus to test some initial hypotheses inspired by the theoretical linguistics literature. Our findings suggest that function words deserve more attention in Referring Expressions Generation than they have so far received, and they have a bearing on the debate about whether different languages make different trade-offs between clarity and brevity.
Anthology ID:
W17-3532
Volume:
Proceedings of the 10th International Conference on Natural Language Generation
Month:
September
Year:
2017
Address:
Santiago de Compostela, Spain
Editors:
Jose M. Alonso, Alberto Bugarín, Ehud Reiter
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
213–217
Language:
URL:
https://aclanthology.org/W17-3532
DOI:
10.18653/v1/W17-3532
Bibkey:
Cite (ACL):
Kees van Deemter, Le Sun, Rint Sybesma, Xiao Li, Bo Chen, and Muyun Yang. 2017. Investigating the content and form of referring expressions in Mandarin: introducing the Mtuna corpus. In Proceedings of the 10th International Conference on Natural Language Generation, pages 213–217, Santiago de Compostela, Spain. Association for Computational Linguistics.
Cite (Informal):
Investigating the content and form of referring expressions in Mandarin: introducing the Mtuna corpus (van Deemter et al., INLG 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/W17-3532.pdf