Jasmina Gajcin
2026
FactCorrector: A Graph-Inspired Approach to Long-Form Factuality Correction of Large Language Models
Javier Carnerero-Cano | Massimiliano Pronesti | Radu Marinescu | Tigran T. Tchrakian | James Barry | Jasmina Gajcin | Yufang Hou | Alessandra Pascale | Elizabeth M. Daly
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Javier Carnerero-Cano | Massimiliano Pronesti | Radu Marinescu | Tigran T. Tchrakian | James Barry | Jasmina Gajcin | Yufang Hou | Alessandra Pascale | Elizabeth M. Daly
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) are widely used in knowledge-intensive applications but often generate factually incorrect responses. A promising approach to rectify these flaws is correcting LLMs using feedback. Therefore, in this paper, we introduce FactCorrector, a new post-hoc correction method that adapts across domains without retraining and leverages structured feedback about the factuality of the original response to generate a correction. To support rigorous evaluations of factuality correction methods, we also develop the VELI5 benchmark, a novel dataset containing systematically injected factual errors and ground-truth corrections. Experiments on VELI5 and several popular long-form factuality datasets show that the FactCorrector approach significantly improves factual precision while preserving relevance, outperforming strong baselines. We release our code at https://ibm.biz/factcorrector.
2025
Synthetic Data for Evaluation: Supporting LLM-as-a-Judge Workflows with EvalAssist
Martín Santillán Cooper | Zahra Ashktorab | Hyo Jin Do | Erik Miehling | Werner Geyer | Jasmina Gajcin | Elizabeth M. Daly | Qian Pan | Michael Desmond
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Martín Santillán Cooper | Zahra Ashktorab | Hyo Jin Do | Erik Miehling | Werner Geyer | Jasmina Gajcin | Elizabeth M. Daly | Qian Pan | Michael Desmond
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present a synthetic data generation tool integrated into EvalAssist. EvalAssist is a web-based application designed to assist human-centered evaluation of language model outputs by allowing users to refine LLM-as-a-Judge evaluation criteria. The synthetic data generation tool in EvalAssist is tailored for evaluation contexts and informed by findings from user studies with AI practitioners. Participants identified key pain points in current workflows including circularity risks (where models are judged by criteria derived by themselves), compounded bias (amplification of biases across multiple stages of a pipeline), and poor support for edge cases, and expressed a strong preference for real-world grounding and fine-grained control. In response, our tool supports flexible prompting, RAG-based grounding, persona diversity, and iterative generation workflows. We also incorporate features for quality assurance and edge case discovery.