ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English

Tim vor der Brück, Alexander Mehler, Zahurul Islam


Abstract
The paper describes a procedure for the automatic generation of a large full-form lexicon of English. We put emphasis on two statistical methods to lexicon extension and adjustment: in terms of a letter-based HMM and in terms of a detector of spelling variants and misspellings. The resulting resource, \collexen, is evaluated with respect to two tasks: text categorization and lexical coverage by example of the SUSANNE corpus and the \openanc.
Anthology ID:
L14-1075
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3756–3760
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1099_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Tim vor der Brück, Alexander Mehler, and Zahurul Islam. 2014. ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3756–3760, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English (vor der Brück et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1099_Paper.pdf