The Persian Dependency Treebank Made Universal

Pegah Safari, Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Alireza Nourian


Abstract
We describe an automatic method for converting the Persian Dependency Treebank (Rasooli et al., 2013) to Universal Dependencies. This treebank contains 29107 sentences. Our experiments along with manual linguistic analysis show that our data is more compatible with Universal Dependencies than the Uppsala Persian Universal Dependency Treebank (Seraji et al., 2016), larger in size and more diverse in vocabulary. Our data brings in labeled attachment F-score of 85.2 in supervised parsing. Also, our delexicalized Persian-to-English parser transfer experiments show that a parsing model trained on our data is ≈2% absolutely more accurate than that of Seraji et al. (2016) in terms of labeled attachment score.
Anthology ID:
2022.lrec-1.766
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
7078–7087
Language:
URL:
https://aclanthology.org/2022.lrec-1.766
DOI:
Bibkey:
Cite (ACL):
Pegah Safari, Mohammad Sadegh Rasooli, Amirsaeid Moloodi, and Alireza Nourian. 2022. The Persian Dependency Treebank Made Universal. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7078–7087, Marseille, France. European Language Resources Association.
Cite (Informal):
The Persian Dependency Treebank Made Universal (Safari et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2022.lrec-1.766.pdf
Code
 UniversalDependencies/UD_Persian-PerDT