Can LLMs Help Encoder Models Maintain Both High Accuracy and Consistency in Temporal Relation Classification?

Adiel Meir, Kfir Bar


Abstract
Temporal relation classification (TRC) demands both accuracy and temporal consistency in event timeline extraction. Encoder-based models achieve high accuracy but introduce inconsistencies because they rely on pairwise classification, while LLMs leverage global context to generate temporal graphs, improving consistency at the cost of accuracy. We assess LLM prompting strategies for TRC and their effectiveness in assisting encoder models with cycle resolution. Results show that while LLMs improve consistency, they struggle with accuracy and do not outperform a simple confidence-based cycle resolution approach. Our code is publicly available at: https://github.com/MatufA/timeline-extraction.
Anthology ID:
2025.inlg-main.41
Volume:
Proceedings of the 18th International Natural Language Generation Conference
Month:
October
Year:
2025
Address:
Hanoi, Vietnam
Editors:
Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
716–733
Language:
URL:
https://preview.aclanthology.org/author-page-lei-gao-usc/2025.inlg-main.41/
DOI:
Bibkey:
Cite (ACL):
Adiel Meir and Kfir Bar. 2025. Can LLMs Help Encoder Models Maintain Both High Accuracy and Consistency in Temporal Relation Classification?. In Proceedings of the 18th International Natural Language Generation Conference, pages 716–733, Hanoi, Vietnam. Association for Computational Linguistics.
Cite (Informal):
Can LLMs Help Encoder Models Maintain Both High Accuracy and Consistency in Temporal Relation Classification? (Meir & Bar, INLG 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-lei-gao-usc/2025.inlg-main.41.pdf