A Corpus of Native, Non-native and Translated Texts
Sergiu Nisioi, Ella Rabinovich, Liviu P. Dinu, Shuly Wintner
Abstract
We describe a monolingual English corpus of original and (human) translated texts, with an accurate annotation of speaker properties, including the original language of the utterances and the speaker’s country of origin. We thus obtain three sub-corpora of texts reflecting native English, non-native English, and English translated from a variety of European languages. This dataset will facilitate the investigation of similarities and differences between these kinds of sub-languages. Moreover, it will facilitate a unified comparative study of translations and language produced by (highly fluent) non-native speakers, two closely-related phenomena that have only been studied in isolation so far.- Anthology ID:
- L16-1664
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 4197–4201
- Language:
- URL:
- https://aclanthology.org/L16-1664
- DOI:
- Cite (ACL):
- Sergiu Nisioi, Ella Rabinovich, Liviu P. Dinu, and Shuly Wintner. 2016. A Corpus of Native, Non-native and Translated Texts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4197–4201, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- A Corpus of Native, Non-native and Translated Texts (Nisioi et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/L16-1664.pdf