MaritimEmails: A Synthetic Dataset for Maritime Chartering Correspondence

Kevin Bruendler, Simon Clematide


Abstract
We introduce MaritimEmails, a large-scale synthetic corpus of 19,817 English-language email threads simulating maritime chartering negotiations between brokers and charterers. Email remains a dominant medium for business communication, yet no public corpora exist for this highly specialized domain due to confidentiality constraints. To address this gap, we generate domain-plausible negotiation exchanges using five contemporary language models under multiple prompting strategies, including Attribute Prompting and Base–Refine (BARE) approaches. Each thread includes structured annotations for vessels, ports, commodities, and Incoterms, enabling supervised training for information extraction and related tasks. Our comparative evaluation covering lexical and semantic diversity, sentiment balance, and verbosity shows that BARE generation increases linguistic variation while maintaining coherence. However, all models exhibit a systematic positivity bias, yielding less negative sentiment than is observed in the Enron reference corpus and likely also in many real negotiation settings. Baseline information extraction experiments with GLiNER and generative Qwen models yield up to 0.86 macro F1 on entity extraction, supporting the dataset’s usefulness. MaritimEmails, together with prompts, scripts, and documentation, is released for research use.
Anthology ID:
2026.lrec-main.599
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
7556–7567
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.599/
DOI:
Bibkey:
Cite (ACL):
Kevin Bruendler and Simon Clematide. 2026. MaritimEmails: A Synthetic Dataset for Maritime Chartering Correspondence. International Conference on Language Resources and Evaluation, main:7556–7567.
Cite (Informal):
MaritimEmails: A Synthetic Dataset for Maritime Chartering Correspondence (Bruendler & Clematide, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.599.pdf