Evaluating Design Choices in Verifiable Generation with Open-source Models

Shuyang Cao; Lu Wang

Evaluating Design Choices in Verifiable Generation with Open-source Models

Abstract

Verifiable generation is introduced to improve the transparency and trustworthiness of outputs produced by large language models (LLMs). Recent studies observe that open-source models struggle to include accurate citations to supporting documents in their generation with in-context learning, in contrast to the strong performance demonstrated by proprietary models. Our work aims to reveal the critical design choices that can benefit open-source models, including generation pipelines, fine-tuning methods, and inference-time compute techniques.We consider three generation pipelines, producing the outputs directly or decomposing the generation into subtasks.These generation pipelines are fine-tuned using supervised fine-tuning and preference-based optimization including further fine-tuning with rejection sampling data and direct preference optimization (DPO).The construction of preference data with varying content and citation diversity is also investigated.Additionally, we examine the benefit of an additional reranking step. With four open-source models, our experiments show that directly generating the outputs achieves the best performance. Compared to other fine-tuning methods, DPO that computes training signals from contrastive pairs consistently yields better performance, and it reaches the peak performance when the contrastive pairs are constructed with sufficient content diversity.We also find that reranking can further boost the performance of verifiable generation systems, but the marginal improvement might not justify the additional cost.

Anthology ID:: 2025.trustnlp-main.27
Volume:: Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Trista Cao, Anubrata Das, Tharindu Kumarage, Yixin Wan, Satyapriya Krishna, Ninareh Mehrabi, Jwala Dhamala, Anil Ramakrishna, Aram Galystan, Anoop Kumar, Rahul Gupta, Kai-Wei Chang
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 412–431
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.trustnlp-main.27/
DOI:
Bibkey:
Cite (ACL):: Shuyang Cao and Lu Wang. 2025. Evaluating Design Choices in Verifiable Generation with Open-source Models. In Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), pages 412–431, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Evaluating Design Choices in Verifiable Generation with Open-source Models (Cao & Wang, TrustNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.trustnlp-main.27.pdf

PDF Cite Search Fix data