Natalie D. Snoeren
2010
The Study of Writing Variants in an Under-resourced Language: Some Evidence from Mobile N-Deletion in Luxembourgish
Natalie D. Snoeren
|
Martine Adda-Decker
|
Gilles Adda
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
The national language of the Grand-Duchy of Luxembourg, Luxembourgish, has often been characterized as one of Europe's under-described and under-resourced languages. Because of a limited written production of Luxembourgish, poorly observed writing standardization (as compared to other languages such as English and French) and a large diversity of spoken varieties, the study of Luxembourgish poses many interesting challenges to automatic speech processing studies as well as to linguistic enquiries. In the present paper, we make use of large corpora to focus on typical writing and derived pronunciation variants in Luxembourgish, elicited by mobile -n deletion (hereafter shortened to MND). Using transcriptions from the House of Parliament debates and 10k words from news reports, we examine the reality of MND variants in written transcripts of speech. The goal of this study is manyfold: quantify the potential of variation due to MND in written Luxembourgish, check the mandatory status of the MND rule and discuss the arising problems for automatic spoken Luxembourgish processing.