MultiCoS: A Multilingual Dataset of Connective Semantics with Context–Sentence Compatibility

Anne Mucha, Ciyang Qing, Wataru Uegaki


Abstract
We present a multilingual dataset of connective semantics. The dataset contains the semantic annotations of clausal connectives (e.g. and and or in English) from 24 languages, based on our original native-speaker elicitation data. Unlike existing lexica on connectives, the dataset includes systematic evidence for the annotations in the form of context-sentence compatibility judgments, including negative evidence. The paper describes the methodology of data collection and the format of the dataset. We also discuss its potential use cases for the validation of cross-linguistic generalizations, examinations of their potential counterexamples, and for benchmarking felicity judgments by NLU systems.
Anthology ID:
2026.lrec-main.381
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
4861–4871
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.381/
DOI:
Bibkey:
Cite (ACL):
Anne Mucha, Ciyang Qing, and Wataru Uegaki. 2026. MultiCoS: A Multilingual Dataset of Connective Semantics with Context–Sentence Compatibility. International Conference on Language Resources and Evaluation, main:4861–4871.
Cite (Informal):
MultiCoS: A Multilingual Dataset of Connective Semantics with Context–Sentence Compatibility (Mucha et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.381.pdf
Optionalsupplementarymaterial:
 2026.lrec-main.381.OptionalSupplementaryMaterial.zip