Abstract
In this paper we describe a language recognition algorithm for multilingual documents that is based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We present the results of an experimental study that showed that the performance of this algorithm has practical value.- Anthology ID:
- 1999.mtsummit-1.46
- Volume:
- Proceedings of Machine Translation Summit VII
- Month:
- September 13-17
- Year:
- 1999
- Address:
- Singapore, Singapore
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 317–323
- Language:
- URL:
- https://aclanthology.org/1999.mtsummit-1.46
- DOI:
- Cite (ACL):
- Yevgeny Ludovik and Ron Zacharski. 1999. Multilingual document language recognition for creating corpora. In Proceedings of Machine Translation Summit VII, pages 317–323, Singapore, Singapore.
- Cite (Informal):
- Multilingual document language recognition for creating corpora (Ludovik & Zacharski, MTSummit 1999)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/1999.mtsummit-1.46.pdf