Creating a High Quality Abstract Meaning Representation Dataset Automatically
Johannes Heinecke, Asadullah Munshi, Frédéric Herledan, Geraldine Damnati
Abstract
As only a few gold training datasets are available today, Abstract Meaning Representation (AMR) parsers are mainly trained on AMR 3.0, the largest dataset (Knight et al., 2020) which contains 55k sentences for training. Even if great progress has been made, leading to parsers that can reach Smatch scores higher than 83% on the AMR 3.0 test dataset, this is not accurate enough to be used in real world application pipelines. More data could help improve performance, but manually annotating sentences is costly. So, we have investigated an approach to automatically create synthetic data using different existing tools and models trained on AMR 3.0. This leads to better parsing performance with Smatch scores increased by 1 to 2 points (depending on the 3 gold test datasets used) with models trained on the augmented data.- Anthology ID:
- 2026.lrec-main.932
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 11907–11915
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.932/
- DOI:
- Cite (ACL):
- Johannes Heinecke, Asadullah Munshi, Frédéric Herledan, and Geraldine Damnati. 2026. Creating a High Quality Abstract Meaning Representation Dataset Automatically. International Conference on Language Resources and Evaluation, main:11907–11915.
- Cite (Informal):
- Creating a High Quality Abstract Meaning Representation Dataset Automatically (Heinecke et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.932.pdf