A set of open source tools for Turkish natural language processing

Çağrı Çöltekin


Abstract
This paper introduces a set of freely available, open-source tools for Turkish that are built around TRmorph, a morphological analyzer introduced earlier in Coltekin (2010). The article first provides an update on the analyzer, which includes a complete rewrite using a different finite-state description language and tool set as well as major tagset changes to comply better with the state-of-the-art computational processing of Turkish and the user requests received so far. Besides these major changes to the analyzer, this paper introduces tools for morphological segmentation, stemming and lemmatization, guessing unknown words, grapheme to phoneme conversion, hyphenation and a morphological disambiguation.
Anthology ID:
L14-1375
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1079–1086
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/437_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Çağrı Çöltekin. 2014. A set of open source tools for Turkish natural language processing. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1079–1086, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
A set of open source tools for Turkish natural language processing (Çöltekin, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/437_Paper.pdf