Abstract
This paper deals with the automatic identification of literate and oral discourse in German texts. A range of linguistic features is selected and their role in distinguishing between literate- and oral-oriented registers is investigated, using a decision-tree classifier. It turns out that all of the investigated features are related in some way to oral conceptuality. Especially simple measures of complexity (average sentence and word length) are prominent indicators of oral and literate discourse. In addition, features of reference and deixis (realized by different types of pronouns) also prove to be very useful in determining the degree of orality of different registers.- Anthology ID:
- W19-1407
- Volume:
- Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- June
- Year:
- 2019
- Address:
- Ann Arbor, Michigan
- Editors:
- Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 64–79
- Language:
- URL:
- https://aclanthology.org/W19-1407
- DOI:
- 10.18653/v1/W19-1407
- Cite (ACL):
- Katrin Ortmann and Stefanie Dipper. 2019. Variation between Different Discourse Types: Literate vs. Oral. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 64–79, Ann Arbor, Michigan. Association for Computational Linguistics.
- Cite (Informal):
- Variation between Different Discourse Types: Literate vs. Oral (Ortmann & Dipper, VarDial 2019)
- PDF:
- https://preview.aclanthology.org/autopr/W19-1407.pdf