ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly
Kimihiro Hasegawa, Wiradee Imrattanatrai, Masaki Asada, Susan E. Holm, Yuran Wang, Xuanang Zhou, Ken Fukuda, Teruko Mitamura
Abstract
Assistants on assembly tasks show great potential to benefit humans ranging from helping with everyday tasks to interacting in industrial settings. However, evaluation resources in assembly activities are underexplored. To foster system development, we propose a new multimodal QA evaluation dataset on assembly activities. Our dataset, ProMQA-Assembly, consists of 646 QA pairs that require multimodal understanding of human activity videos and their instruction manuals in an online-style manner. For cost effectiveness in the data creation, we adopt a semi-automated QA annotation approach, where LLMs generate candidate QA pairs and humans verify them. We further improve QA generation by integrating fine-grained action labels to diversify question types. Additionally, we create 81 instruction task graphs for our target assembly tasks. These newly created task graphs are used in our benchmarking experiment, as well as in facilitating the human verification process. With our dataset, we benchmark models, including competitive proprietary multimodal models. We find that ProMQA-Assembly contains challenging multimodal questions, where reasoning models showcase promising results. We believe our new evaluation dataset contributes to the further development of procedural-activity assistants.- Anthology ID:
- 2026.lrec-main.714
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 9082–9104
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.714/
- DOI:
- Cite (ACL):
- Kimihiro Hasegawa, Wiradee Imrattanatrai, Masaki Asada, Susan E. Holm, Yuran Wang, Xuanang Zhou, Ken Fukuda, and Teruko Mitamura. 2026. ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly. International Conference on Language Resources and Evaluation, main:9082–9104.
- Cite (Informal):
- ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly (Hasegawa et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.714.pdf