Don’t Just Listen, Try Planning: Graph-based Retrieval-Generation Agent for Long-form Audio Meeting Understanding

Quanwei Tang, Dong Zhang, Shoushan Li, Guodong Zhou


Abstract
Long-form audio meeting understanding (LAMU) is gaining attention, but dedicated question answering (QA) datasets are lacking. Previous tailored speech QA and existing Speech LLMs suffer from acoustic information loss and poor long-term dependency capture. We construct the LongAudioQA dataset and propose the GRGA model, which models heterogeneous audio features into a multi-dimensional graph and leverages agent planning for retrieval and answer generation, effectively addressing existing limitations.
Anthology ID:
2026.findings-acl.1038
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20715–20742
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1038/
DOI:
Bibkey:
Cite (ACL):
Quanwei Tang, Dong Zhang, Shoushan Li, and Guodong Zhou. 2026. Don’t Just Listen, Try Planning: Graph-based Retrieval-Generation Agent for Long-form Audio Meeting Understanding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20715–20742, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Don’t Just Listen, Try Planning: Graph-based Retrieval-Generation Agent for Long-form Audio Meeting Understanding (Tang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1038.pdf
Checklist:
 2026.findings-acl.1038.checklist.pdf