Abstract
Machine translation engines draw on various types of databases. This paper is concerned with Arabic as a source or target language, and focuses on lexical databases. The non-concatenative nature of Arabic morphology, the complex structure of Arabic word-forms, and the general use of vowel-free writing present a real challenge to NLP developers. We show here how and why a stem-grounded lexical database, the items of which are associated with grammar-lexis specifications – as opposed to a root-&-pattern database –, is motivated both linguistically and with regards to efficiency, economy and modularity. Arguments in favour of databases relying on stems associated with grammar-lexis specifications (such as DIINAR.1 or the Arabic dB under development at SYSTRAN), rather than on roots and patterns, are the following: (a) The latter include huge numbers of rule-generated word-forms, which do not actually appear in the language. (b) Rule-generated lemmas – as opposed to existing ones – are widely under-specified with regards to grammar-lexis relations. (c) In a Semitic language such as Arabic, the mapping of grammar-lexis specifications that need to be associated with every lexical entry of the database is decisive. (d) These specifications can only be included in a stem-based dB. Points (a) to (d) are crucial and in the context of machine translation involving Arabic.- Anthology ID:
- 2003.mtsummit-semit.5
- Volume:
- Workshop on Machine Translation for Semitic languages: issues and approaches
- Month:
- September 23-27
- Year:
- 2003
- Address:
- New Orleans, USA
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2003.mtsummit-semit.5/
- DOI:
- Cite (ACL):
- Joseph Dichy and Ali Farghaly. 2003. Roots & patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual database centred on Arabic be built?. In Workshop on Machine Translation for Semitic languages: issues and approaches, New Orleans, USA.
- Cite (Informal):
- Roots & patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual database centred on Arabic be built? (Dichy & Farghaly, MTSummit 2003)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2003.mtsummit-semit.5.pdf