ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English

Tim vor der Brück; Alexander Mehler; Zahurul Islam

ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English

Tim vor der Brück, Alexander Mehler, Zahurul Islam

Abstract

The paper describes a procedure for the automatic generation of a large full-form lexicon of English. We put emphasis on two statistical methods to lexicon extension and adjustment: in terms of a letter-based HMM and in terms of a detector of spelling variants and misspellings. The resulting resource, \collexen, is evaluated with respect to two tasks: text categorization and lexical coverage by example of the SUSANNE corpus and the \openanc.

Anthology ID:: L14-1075
Volume:: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:: May
Year:: 2014
Address:: Reykjavik, Iceland
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 3756–3760
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1099_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Tim vor der Brück, Alexander Mehler, and Zahurul Islam. 2014. ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3756–3760, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):: ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English (vor der Brück et al., LREC 2014)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1099_Paper.pdf

PDF Search