Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

Tonmoy Talukder, G M Shahariar


Abstract
This paper introduces Bangla Key2Text, a large-scale dataset of 2.6 million Bangla keyword-text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword-text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, mT5 and BanglaT5, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language generation and keyword-to-text generation tasks.
Anthology ID:
2026.lrec-main.303
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
3805–3822
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.303/
DOI:
Bibkey:
Cite (ACL):
Tonmoy Talukder and G M Shahariar. 2026. Bangla Key2Text: Text Generation from Keywords for a Low Resource Language. International Conference on Language Resources and Evaluation, main:3805–3822.
Cite (Informal):
Bangla Key2Text: Text Generation from Keywords for a Low Resource Language (Talukder & Shahariar, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.303.pdf