Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing

Özlem Çetinoğlu


Abstract
So far predicted scenarios for Turkish dependency parsing have used a morphological disambiguator that is trained on the data distributed with the tool(Sak et al., 2008). Although models trained on this data have high accuracy scores on the test and development data of the same set, the accuracy drastically drops when the model is used in the preprocessing of Turkish Treebank parsing experiments. We propose to use the Turkish Treebank(Oflazer et al., 2003) as a morphological resource to overcome this problem and convert the treebank to the morphological disambiguator’s format. The experimental results show that we achieve improvements in disambiguating the Turkish Treebank and the results also carry over to parsing. With the help of better morphological analysis, we present the best labelled dependency parsing scores to date on Turkish.
Anthology ID:
L14-1056
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3360–3365
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1073_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Özlem Çetinoğlu. 2014. Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3360–3365, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Turkish Treebank as a Gold Standard for Morphological Disambiguation and Its Influence on Parsing (Çetinoğlu, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1073_Paper.pdf