Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

Tonmoy Talukder; G. M. Shahariar

Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

Abstract

This paper introduces Bangla Key2Text, a large-scale dataset of 2.6 million Bangla keyword-text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword-text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, mT5 and BanglaT5, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language generation and keyword-to-text generation tasks.

Anthology ID:: 2026.lrec-main.303
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 3805–3822
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.303/
DOI:
Bibkey:
Cite (ACL):: Tonmoy Talukder and G M Shahariar. 2026. Bangla Key2Text: Text Generation from Keywords for a Low Resource Language. International Conference on Language Resources and Evaluation, main:3805–3822.
Cite (Informal):: Bangla Key2Text: Text Generation from Keywords for a Low Resource Language (Talukder & Shahariar, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.303.pdf

PDF Cite Search Fix data