This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
StefanSchaden
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
This paper presents a corpus of non-native speech that contains pronunciation variants of European city names from fivecountries spoken by speakers of four native languages. It was originally designed as a research tool for the study ofpronunciation errors by non-native speakers in the pronunciation of foreign city names. The corpus has now been released. Followinga brief sketch of the research context in which this data collection was established, the first part of this paper describes the contents and technical specifications of the corpus (design, speakers, language material, recording conditions).Compared to corpora of native speech, non-native speech compilations raise a number of additional difficulties that requirespecific attention and methodology. Therefore, the second part of the paper aims to point out some of these general issuesfrom the perspective of the experience gained in our research. Strategies to deal with these difficulties will be exploredalong with their specific benefits and shortfalls, concluding that non-native speech corpora require a number of specificdesign guidelines which are often difficult to put into practice.
The paper reports on the evaluation of a rule-based technique to model prototypical non-native pronunciation variants on the symbolic transcription level. This technique was developed to explore the possibility of an automatic generation of adapted pronunciation lexicons for different non-native speaker groups. The rule sets, which are currently available for nine language directions, are based on non-native speech data compiled specifically for this purpose. Since manual phonetic annotations are available for the speech data, the evaluation was performed on the transcription level by measuring the phonetic distance of the automatically generated pronunciations variants and actual pronunciations of non-native speakers. One of the central questions to be addressed by the evaluation is whether the rules have any predictive value: It has to be determined if and to what degree the rules are capable of generating realistic pronunciation variants for previously unseen speakers. Secondly, the rules should not only represent the pronunciations of individual speakers adequately; instead, they should be representative of speaker groups (cross-speaker representation). The paper outlines the evaluation methodology and presents results for selected language directions.