Pablo Duboue
Also published as: Pablo A. Duboue, Pablo Ariel Duboue
2026
FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking
Denys Katerenchuk | Pablo Duboue | Keelan Evanini
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Denys Katerenchuk | Pablo Duboue | Keelan Evanini
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present a unified, data-efficient framework for training grounded domain-specific LLMs that optimizes answer quality, citation grounding, and calibrated refusal under real-world deployment constraints. First, we describe a data generation pipeline that combines LLM-as-a-Judge filtering, citation annotation, and curriculum learning with only 143M tokens. The resulting 12B model achieves high answer quality outperforming GPT-4.1 on citation grounding, with a modest citation tradeoff versus the untuned base. Second, we propose a calibrated refusal mechanism: training on 22% unanswerable examples yield a 12% “I don’t know” rate, substantially improving over the base model’s unsafe 4.3% rate while avoiding GPT-4.1’s over-refusal (20.2%). Third, we present an end-to-end methodology spanning from data curation to quantized serving. The system is deployed at 40+ financial institutions, achieving a 7.1percentage point improvement in query resolution (p < 0.001). Additionally, the model delivers 3–5x faster responses at 20–50x lower cost compared to GPT-4.1.
2019
Rationale Classification for Educational Trading Platforms
Annie Ying | Pablo Duboue
Proceedings of the First Workshop on Financial Technology and Natural Language Processing
Annie Ying | Pablo Duboue
Proceedings of the First Workshop on Financial Technology and Natural Language Processing
2016
Automatic Reports from Spreadsheets: Data Analysis for the Rest of Us
Pablo Duboue
Proceedings of the 9th International Natural Language Generation conference
Pablo Duboue
Proceedings of the 9th International Natural Language Generation conference
On the Robustness of Standalone Referring Expression Generation Algorithms Using RDF Data
Pablo Duboue | Martin Ariel Domínguez | Paula Estrella
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)
Pablo Duboue | Martin Ariel Domínguez | Paula Estrella
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)
2013
Thoughtland: Natural Language Descriptions for Machine Learning n-dimensional Error Functions
Pablo Duboue
Proceedings of the 14th European Workshop on Natural Language Generation
Pablo Duboue
Proceedings of the 14th European Workshop on Natural Language Generation
On the Feasibility of Automatically Describing n-dimensional Objects
Pablo Duboue
Proceedings of the 14th European Workshop on Natural Language Generation
Pablo Duboue
Proceedings of the 14th European Workshop on Natural Language Generation
2012
Extractive email thread summarization: Can we do better than He Said She Said?
Pablo Duboue
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference
Pablo Duboue
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference
On The Feasibility of Open Domain Referring Expression Generation Using Large Scale Folksonomies
Fabián Pacheco | Pablo Duboue | Martín Domínguez
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Fabián Pacheco | Pablo Duboue | Martín Domínguez
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation
Jing He | Pablo Duboue | Jian-Yun Nie
Proceedings of COLING 2012
Jing He | Pablo Duboue | Jian-Yun Nie
Proceedings of COLING 2012
2011
The GIVE-2.5 C Generation System
David Nicolás Racca | Luciana Benotti | Pablo Duboue
Proceedings of the 13th European Workshop on Natural Language Generation
David Nicolás Racca | Luciana Benotti | Pablo Duboue
Proceedings of the 13th European Workshop on Natural Language Generation
2006
Improving QA Accuracy by Question Inversion
John Prager | Pablo Duboue | Jennifer Chu-Carroll
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
John Prager | Pablo Duboue | Jennifer Chu-Carroll
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Answering the question you wish they had asked: The impact of paraphrasing for Question Answering
Pablo Duboue | Jennifer Chu-Carroll
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Pablo Duboue | Jennifer Chu-Carroll
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
2003
Statistical Acquisition of Content Selection Rules for Natural Language Generation
Pablo Ariel Duboue | Kathleen R. McKeown
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing
Pablo Ariel Duboue | Kathleen R. McKeown
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing
2002
Content Planner Construction via Evolutionary Algorithms and a Corpus-based Fitness Function
Pablo Duboue | Kathleen McKeown
Proceedings of the International Natural Language Generation Conference
Pablo Duboue | Kathleen McKeown
Proceedings of the International Natural Language Generation Conference