Evaluating the Reliability and Interaction of Recursively Used Feature Classes for Terminology Extraction
Abstract
Feature design and selection is a crucial aspect when treating terminology extraction as a machine learning classification problem. We designed feature classes which characterize different properties of terms based on distributions, and propose a new feature class for components of term candidates. By using random forests, we infer optimal features which are later used to build decision tree classifiers. We evaluate our method using the ACL RD-TEC dataset. We demonstrate the importance of the novel feature class for downgrading termhood which exploits properties of term components. Furthermore, our classification suggests that the identification of reliable term candidates should be performed successively, rather than just once.- Anthology ID:
- E17-4012
- Volume:
- Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Florian Kunneman, Uxoa Iñurrieta, John J. Camilleri, Mariona Coll Ardanuy
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 113–121
- Language:
- URL:
- https://aclanthology.org/E17-4012
- DOI:
- Cite (ACL):
- Anna Hätty, Michael Dorna, and Sabine Schulte im Walde. 2017. Evaluating the Reliability and Interaction of Recursively Used Feature Classes for Terminology Extraction. In Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 113–121, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating the Reliability and Interaction of Recursively Used Feature Classes for Terminology Extraction (Hätty et al., EACL 2017)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/E17-4012.pdf