Multilingual document language recognition for creating corpora

Yevgeny Ludovik, Ron Zacharski


Abstract
In this paper we describe a language recognition algorithm for multilingual documents that is based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We present the results of an experimental study that showed that the performance of this algorithm has practical value.
Anthology ID:
1999.mtsummit-1.46
Volume:
Proceedings of Machine Translation Summit VII
Month:
September 13-17
Year:
1999
Address:
Singapore, Singapore
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
317–323
Language:
URL:
https://aclanthology.org/1999.mtsummit-1.46
DOI:
Bibkey:
Cite (ACL):
Yevgeny Ludovik and Ron Zacharski. 1999. Multilingual document language recognition for creating corpora. In Proceedings of Machine Translation Summit VII, pages 317–323, Singapore, Singapore.
Cite (Informal):
Multilingual document language recognition for creating corpora (Ludovik & Zacharski, MTSummit 1999)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/1999.mtsummit-1.46.pdf