Word Sketches for Turkish

Bharat Ram Ambati, Siva Reddy, Adam Kilgarriff


Abstract
Word sketches are one-page, automatic, corpus-based summaries of a word's grammatical and collocational behaviour. In this paper we present word sketches for Turkish. Until now, word sketches have been generated using a purpose-built finite-state grammars. Here, we use an existing dependency parser. We describe the process of collecting a 42 million word corpus, parsing it, and generating word sketches from it. We evaluate the word sketches in comparison with word sketches from a language independent sketch grammar on an external evaluation task called topic coherence, using Turkish WordNet to derive an evaluation set of coherent topics.
Anthology ID:
L12-1332
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2945–2950
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/585_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Bharat Ram Ambati, Siva Reddy, and Adam Kilgarriff. 2012. Word Sketches for Turkish. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2945–2950, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Word Sketches for Turkish (Ambati et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/585_Paper.pdf