Creating a High Quality Abstract Meaning Representation Dataset Automatically

Johannes Heinecke, Asadullah Munshi, Frédéric Herledan, Geraldine Damnati


Abstract
As only a few gold training datasets are available today, Abstract Meaning Representation (AMR) parsers are mainly trained on AMR 3.0, the largest dataset (Knight et al., 2020) which contains 55k sentences for training. Even if great progress has been made, leading to parsers that can reach Smatch scores higher than 83% on the AMR 3.0 test dataset, this is not accurate enough to be used in real world application pipelines. More data could help improve performance, but manually annotating sentences is costly. So, we have investigated an approach to automatically create synthetic data using different existing tools and models trained on AMR 3.0. This leads to better parsing performance with Smatch scores increased by 1 to 2 points (depending on the 3 gold test datasets used) with models trained on the augmented data.
Anthology ID:
2026.lrec-main.932
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11907–11915
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.932/
DOI:
Bibkey:
Cite (ACL):
Johannes Heinecke, Asadullah Munshi, Frédéric Herledan, and Geraldine Damnati. 2026. Creating a High Quality Abstract Meaning Representation Dataset Automatically. International Conference on Language Resources and Evaluation, main:11907–11915.
Cite (Informal):
Creating a High Quality Abstract Meaning Representation Dataset Automatically (Heinecke et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.932.pdf