Abstract
We propose a method for automatic term extraction based on a statistical measure that ranks term candidates according to their semantic relevance to a specialised domain. As a measure of relevance we use term co-occurrence, defined as the repeated instantiation of two terms in the same sentences, in indifferent order and at variable distances. In this way, term candidates are ranked higher if they show a tendency to co-occur with a selected group of other units, as opposed to those showing more uniform distributions. No external resources are needed for the application of the method, but performance improves when provided with a pre-existing term list. We present results of the application of this method to a Spanish-English Linguistics corpus, and the evaluation compares favourably with a standard method based on reference corpora.- Anthology ID:
- 2022.term-1.5
- Volume:
- Proceedings of the Workshop on Terminology in the 21st century: many faces, many places
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Rute Costa, Sara Carvalho, Ana Ostroški Anić, Anas Fahad Khan
- Venue:
- TERM
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 26–29
- Language:
- URL:
- https://aclanthology.org/2022.term-1.5
- DOI:
- Cite (ACL):
- Rogelio Nazar and David Lindemann. 2022. Terminology extraction using co-occurrence patterns as predictors of semantic relevance. In Proceedings of the Workshop on Terminology in the 21st century: many faces, many places, pages 26–29, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Terminology extraction using co-occurrence patterns as predictors of semantic relevance (Nazar & Lindemann, TERM 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2022.term-1.5.pdf