Quintin Fettes
2025
Improving Model Factuality with Fine-grained Critique-based Evaluator
Yiqing Xie
|
Wenxuan Zhou
|
Pradyot Prakash
|
Di Jin
|
Yuning Mao
|
Quintin Fettes
|
Arya Talebzadeh
|
Sinong Wang
|
Han Fang
|
Carolyn Rose
|
Daniel Fried
|
Hejia Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. In particular, we train FenCE to (1) generate textual critiques along with scores and (2) make claim-level judgment based on diverse source documents obtained by various tools, via data augmentation on a combination of public judgment datasets. We then present a framework that leverages FenCE to improve the factuality of LM generators by constructing training data. Specifically, we generate a set of candidate responses, ask FenCE to revise and score each response without introducing lesser-known facts, and train the generator by preferring highly scored revised responses. Experiments show that our data augmentation methods improve the evaluator’s accuracy by 2.9% on LLM-AggreFact. With FenCE, we improve Llama2-7B-chat/Llama3-8B-chat’s factuality rate by 16.86%/14.45% on FActScore, outperforming state-of-the-art factuality finetuning methods by 8.83%/6.96%.
Search
Fix author
Co-authors
- Han Fang 1
- Daniel Fried 1
- Di Jin 1
- Yuning Mao 1
- Pradyot Prakash 1
- show all...
Venues
- acl1