Accounting for Language Effect in the Evaluation of Cross-lingual AMR Parsers

Shira Wein, Nathan Schneider


Abstract
Cross-lingual Abstract Meaning Representation (AMR) parsers are currently evaluated in comparison to gold English AMRs, despite parsing a language other than English, due to the lack of multilingual AMR evaluation metrics. This evaluation practice is problematic because of the established effect of source language on AMR structure. In this work, we present three multilingual adaptations of monolingual AMR evaluation metrics and compare the performance of these metrics to sentence-level human judgments. We then use our most highly correlated metric to evaluate the output of state-of-the-art cross-lingual AMR parsers, finding that Smatch may still be a useful metric in comparison to gold English AMRs, while our multilingual adaptation of S2match (XS2match) is best for comparison with gold in-language AMRs.
Anthology ID:
2022.coling-1.336
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3824–3834
Language:
URL:
https://aclanthology.org/2022.coling-1.336
DOI:
Bibkey:
Cite (ACL):
Shira Wein and Nathan Schneider. 2022. Accounting for Language Effect in the Evaluation of Cross-lingual AMR Parsers. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3824–3834, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Accounting for Language Effect in the Evaluation of Cross-lingual AMR Parsers (Wein & Schneider, COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.336.pdf
Code
 shirawein/crossling-amr-eval