Abstract
This paper introduces a set of freely available, open-source tools for Turkish that are built around TRmorph, a morphological analyzer introduced earlier in Coltekin (2010). The article first provides an update on the analyzer, which includes a complete rewrite using a different finite-state description language and tool set as well as major tagset changes to comply better with the state-of-the-art computational processing of Turkish and the user requests received so far. Besides these major changes to the analyzer, this paper introduces tools for morphological segmentation, stemming and lemmatization, guessing unknown words, grapheme to phoneme conversion, hyphenation and a morphological disambiguation.- Anthology ID:
- L14-1375
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1079–1086
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/437_Paper.pdf
- DOI:
- Cite (ACL):
- Çağrı Çöltekin. 2014. A set of open source tools for Turkish natural language processing. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1079–1086, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- A set of open source tools for Turkish natural language processing (Çöltekin, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/437_Paper.pdf