TermoPL - a Flexible Tool for Terminology Extraction

Malgorzata Marciniak, Agnieszka Mykowiecka, Piotr Rychlik


Abstract
The purpose of this paper is to introduce the TermoPL tool created to extract terminology from domain corpora in Polish. The program extracts noun phrases, term candidates, with the help of a simple grammar that can be adapted for user’s needs. It applies the C-value method to rank term candidates being either the longest identified nominal phrases or their nested subphrases. The method operates on simplified base forms in order to unify morphological variants of terms and to recognize their contexts. We support the recognition of nested terms by word connection strength which allows us to eliminate truncated phrases from the top part of the term list. The program has an option to convert simplified forms of phrases into correct phrases in the nominal case. TermoPL accepts as input morphologically annotated and disambiguated domain texts and creates a list of terms, the top part of which comprises domain terminology. It can also compare two candidate term lists using three different coefficients showing asymmetry of term occurrences in this data.
Anthology ID:
L16-1361
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2278–2284
Language:
URL:
https://aclanthology.org/L16-1361
DOI:
Bibkey:
Cite (ACL):
Malgorzata Marciniak, Agnieszka Mykowiecka, and Piotr Rychlik. 2016. TermoPL - a Flexible Tool for Terminology Extraction. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2278–2284, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
TermoPL - a Flexible Tool for Terminology Extraction (Marciniak et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/L16-1361.pdf