Automatic Extraction and Evaluation of Arabic LFG Resources

Mohammed Attia, Khaled Shaalan, Lamia Tounsi, Josef van Genabith


Abstract
This paper presents the results of an approach to automatically acquire large-scale, probabilistic Lexical-Functional Grammar (LFG) resources for Arabic from the Penn Arabic Treebank (ATB). Our starting point is the earlier, work of (Tounsi et al., 2009) on automatic LFG f(eature)-structure annotation for Arabic using the ATB. They exploit tree configuration, POS categories, functional tags, local heads and trace information to annotate nodes with LFG feature-structure equations. We utilize this annotation to automatically acquire grammatical function (dependency) based subcategorization frames and paths linking long-distance dependencies (LDDs). Many state-of-the-art treebank-based probabilistic parsing approaches are scalable and robust but often also shallow: they do not capture LDDs and represent only local information. Subcategorization frames and LDD paths can be used to recover LDDs from such parser output to capture deep linguistic information. Automatic acquisition of language resources from existing treebanks saves time and effort involved in creating such resources by hand. Moreover, data-driven automatic acquisition naturally associates probabilistic information with subcategorization frames and LDD paths. Finally, based on the statistical distribution of LDD path types, we propose empirical bounds on traditional regular expression based functional uncertainty equations used to handle LDDs in LFG.
Anthology ID:
L12-1349
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1947–1954
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/609_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Mohammed Attia, Khaled Shaalan, Lamia Tounsi, and Josef van Genabith. 2012. Automatic Extraction and Evaluation of Arabic LFG Resources. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1947–1954, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Automatic Extraction and Evaluation of Arabic LFG Resources (Attia et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/609_Paper.pdf