Bootstrapping Term Extractors for Multiple Languages

Ahmet Aker, Monica Paramita, Emma Barker, Robert Gaizauskas


Abstract
Terminology extraction resources are needed for a wide range of human language technology applications, including knowledge management, information extraction, semantic search, cross-language information retrieval and automatic and assisted translation. We create a low cost method for creating terminology extraction resources for 21 non-English EU languages. Using parallel corpora and a projection method, we create a General POS Tagger for these languages. We also investigate the use of EuroVoc terms and Wikipedia corpus to automatically create term grammar for each language. Our results show that these automatically generated resources can assist term extraction process with similar performance to manually generated resources. All resources resulted in this experiment are freely available for download.
Anthology ID:
L14-1364
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
483–489
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/425_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ahmet Aker, Monica Paramita, Emma Barker, and Robert Gaizauskas. 2014. Bootstrapping Term Extractors for Multiple Languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 483–489, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Bootstrapping Term Extractors for Multiple Languages (Aker et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/425_Paper.pdf