Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
Luisa Geiger, Mareike Hartmann, Michael Sullivan, Alexander Koller
Abstract
In this paper, we propose a novel, automatic tree-based evaluation metric for LLM-generated step-by-step assembly instructions, that more accurately reflects spatiotemporal aspects of construction than traditional metrics such as BLEU and BERT similarity scores. We apply our proposed metric to the domain of sewing instructions, and show that our metric better correlates with manually-annotated error counts, demonstrating our metric’s superiority for evaluating the spatiotemporal soundness of sewing instructions. Further experiments show that our metric is more robust than traditional approaches against artificially-constructed counterfactual examples that are specifically constructed to confound metrics that rely on textual similarity.- Anthology ID:
- 2025.emnlp-main.934
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 18519–18536
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.934/
- DOI:
- Cite (ACL):
- Luisa Geiger, Mareike Hartmann, Michael Sullivan, and Alexander Koller. 2025. Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18519–18536, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions (Geiger et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.934.pdf