SpeechEE@XLLM25: End-to-End Structured Event Extraction from Speech
Soham Chaudhuri, Diganta Biswas, Dipanjan Saha, Dipankar Das, Sivaji Bandyopadhyay
Abstract
Event extraction from text is a complex taskthat involves the identification of event triggersand their supporting arguments. Whenapplied to speech, this task becomes evenmore challenging due to the continuous natureof audio signals and the need for robustAutomatic Speech Recognition (ASR). Thispaper proposes an approach that integratesASR with event extraction by utilizing theWhisper model for speech recognition and aText2Event2 Transformer for extracting eventsfrom English audio samples. The Whispermodel is used to generate transcripts from audio,which are then fed into the Text2Event2Transformer to identify event triggers and theirarguments. This approach combines two difficulttasks into one, streamlining the processof extracting structured event information directlyfrom audio. Our approach leverages arobust ASR system (Whisper) followed by aparameter-efficient transformer (Text2Event2fine-tuned via LoRA) to extract structuredevents from raw speech. Unlike prior worktrained on gold textual input, our pipeline istrained end-to-end on noisy ASR outputs. Despitesignificant resource constraints and datanoise, our system ranked first in the ACL 2025XLLM Shared Task II.- Anthology ID:
- 2025.xllm-1.24
- Volume:
- Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Hao Fei, Kewei Tu, Yuhui Zhang, Xiang Hu, Wenjuan Han, Zixia Jia, Zilong Zheng, Yixin Cao, Meishan Zhang, Wei Lu, N. Siddharth, Lilja Øvrelid, Nianwen Xue, Yue Zhang
- Venues:
- XLLM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 283–287
- Language:
- URL:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.xllm-1.24/
- DOI:
- Cite (ACL):
- Soham Chaudhuri, Diganta Biswas, Dipanjan Saha, Dipankar Das, and Sivaji Bandyopadhyay. 2025. SpeechEE@XLLM25: End-to-End Structured Event Extraction from Speech. In Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025), pages 283–287, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- SpeechEE@XLLM25: End-to-End Structured Event Extraction from Speech (Chaudhuri et al., XLLM 2025)
- PDF:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.xllm-1.24.pdf