Automatic Extraction and Evaluation of Arabic LFG Resources
Mohammed Attia, Khaled Shaalan, Lamia Tounsi, Josef van Genabith
Abstract
This paper presents the results of an approach to automatically acquire large-scale, probabilistic Lexical-Functional Grammar (LFG) resources for Arabic from the Penn Arabic Treebank (ATB). Our starting point is the earlier, work of (Tounsi et al., 2009) on automatic LFG f(eature)-structure annotation for Arabic using the ATB. They exploit tree configuration, POS categories, functional tags, local heads and trace information to annotate nodes with LFG feature-structure equations. We utilize this annotation to automatically acquire grammatical function (dependency) based subcategorization frames and paths linking long-distance dependencies (LDDs). Many state-of-the-art treebank-based probabilistic parsing approaches are scalable and robust but often also shallow: they do not capture LDDs and represent only local information. Subcategorization frames and LDD paths can be used to recover LDDs from such parser output to capture deep linguistic information. Automatic acquisition of language resources from existing treebanks saves time and effort involved in creating such resources by hand. Moreover, data-driven automatic acquisition naturally associates probabilistic information with subcategorization frames and LDD paths. Finally, based on the statistical distribution of LDD path types, we propose empirical bounds on traditional regular expression based functional uncertainty equations used to handle LDDs in LFG.- Anthology ID:
- L12-1349
- Volume:
- Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
- Month:
- May
- Year:
- 2012
- Address:
- Istanbul, Turkey
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1947–1954
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/609_Paper.pdf
- DOI:
- Cite (ACL):
- Mohammed Attia, Khaled Shaalan, Lamia Tounsi, and Josef van Genabith. 2012. Automatic Extraction and Evaluation of Arabic LFG Resources. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1947–1954, Istanbul, Turkey. European Language Resources Association (ELRA).
- Cite (Informal):
- Automatic Extraction and Evaluation of Arabic LFG Resources (Attia et al., LREC 2012)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2012/pdf/609_Paper.pdf