Variation between Different Discourse Types: Literate vs. Oral

Katrin Ortmann, Stefanie Dipper


Abstract
This paper deals with the automatic identification of literate and oral discourse in German texts. A range of linguistic features is selected and their role in distinguishing between literate- and oral-oriented registers is investigated, using a decision-tree classifier. It turns out that all of the investigated features are related in some way to oral conceptuality. Especially simple measures of complexity (average sentence and word length) are prominent indicators of oral and literate discourse. In addition, features of reference and deixis (realized by different types of pronouns) also prove to be very useful in determining the degree of orality of different registers.
Anthology ID:
W19-1407
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Editors:
Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
64–79
Language:
URL:
https://aclanthology.org/W19-1407
DOI:
10.18653/v1/W19-1407
Bibkey:
Cite (ACL):
Katrin Ortmann and Stefanie Dipper. 2019. Variation between Different Discourse Types: Literate vs. Oral. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 64–79, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
Variation between Different Discourse Types: Literate vs. Oral (Ortmann & Dipper, VarDial 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/autopr/W19-1407.pdf