Pablo Duboue - ACL Anthology

This page is part of a temporary preview of a proposed change that may be incomplete or contain mistakes. It is not official and will be removed when the change is merged or abandoned.

Pablo Duboue

Also published as: Pablo A. Duboue, Pablo Ariel Duboue

2026

FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in Banking
Denys Katerenchuk | Pablo Duboue | Keelan Evanini
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present a unified, data-efficient framework for training grounded domain-specific LLMs that optimizes answer quality, citation grounding, and calibrated refusal under real-world deployment constraints. First, we describe a data generation pipeline that combines LLM-as-a-Judge filtering, citation annotation, and curriculum learning with only 143M tokens. The resulting 12B model achieves high answer quality outperforming GPT-4.1 on citation grounding, with a modest citation tradeoff versus the untuned base. Second, we propose a calibrated refusal mechanism: training on 22% unanswerable examples yield a 12% “I don’t know” rate, substantially improving over the base model’s unsafe 4.3% rate while avoiding GPT-4.1’s over-refusal (20.2%). Third, we present an end-to-end methodology spanning from data curation to quantized serving. The system is deployed at 40+ financial institutions, achieving a 7.1percentage point improvement in query resolution (p < 0.001). Additionally, the model delivers 3–5x faster responses at 20–50x lower cost compared to GPT-4.1.

2019

Rationale Classification for Educational Trading Platforms
Annie Ying | Pablo Duboue
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

2016

Automatic Reports from Spreadsheets: Data Analysis for the Rest of Us
Pablo Duboue
Proceedings of the 9th International Natural Language Generation conference

On the Robustness of Standalone Referring Expression Generation Algorithms Using RDF Data
Pablo Duboue | Martin Ariel Domínguez | Paula Estrella
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

2013

Thoughtland: Natural Language Descriptions for Machine Learning n-dimensional Error Functions
Pablo Duboue
Proceedings of the 14th European Workshop on Natural Language Generation

On the Feasibility of Automatically Describing n-dimensional Objects
Pablo Duboue
Proceedings of the 14th European Workshop on Natural Language Generation

2012

Extractive email thread summarization: Can we do better than He Said She Said?
Pablo Duboue
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

On The Feasibility of Open Domain Referring Expression Generation Using Large Scale Folksonomies
Fabián Pacheco | Pablo Duboue | Martín Domínguez
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation
Jing He | Pablo Duboue | Jian-Yun Nie
Proceedings of COLING 2012

2011

The GIVE-2.5 C Generation System
David Nicolás Racca | Luciana Benotti | Pablo Duboue
Proceedings of the 13th European Workshop on Natural Language Generation

2006

Improving QA Accuracy by Question Inversion
John Prager | Pablo Duboue | Jennifer Chu-Carroll
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Answering the question you wish they had asked: The impact of paraphrasing for Question Answering
Pablo Duboue | Jennifer Chu-Carroll
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2003

Statistical Acquisition of Content Selection Rules for Natural Language Generation
Pablo Ariel Duboue | Kathleen R. McKeown
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

2002

Content Planner Construction via Evolutionary Algorithms and a Corpus-based Fitness Function
Pablo Duboue | Kathleen McKeown
Proceedings of the International Natural Language Generation Conference

2001

Empirically Estimating Order Constraints for Content Planning in Generation
Pablo A. Duboue | Kathleen R. McKeown
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

Co-authors

Keelan Evanini 1

Denys Katerenchuk 1

Fabián Pacheco 1

David Nicolas Racca 1

Venues