A Multi-Word Term Extraction Program for Arabic Language

Siham Boulaknadel, Beatrice Daille, Driss Aboutajdine


Abstract
Terminology extraction commonly includes two steps: identification of term-like units in the texts, mostly multi-word phrases, and the ranking of the extracted term-like units according to their domain representativity. In this paper, we design a multi-word term extraction program for Arabic language. The linguistic filtering performs a morphosyntactic analysis and takes into account several types of variations. The domain representativity is measure thanks to statistical scores. We evalutate several association measures and show that the results we otained are consitent with those obtained for Romance languages.
Anthology ID:
L08-1155
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/378_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Siham Boulaknadel, Beatrice Daille, and Driss Aboutajdine. 2008. A Multi-Word Term Extraction Program for Arabic Language. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
A Multi-Word Term Extraction Program for Arabic Language (Boulaknadel et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/378_paper.pdf