Abstract
Coherence in discourse is fundamental for comprehension and perception. Much research on coherence modeling has focused on better model architectures and training setups optimizing on the permuted document task, where random permutations of a coherent document are considered incoherent. However, there’s very limited work on creating “informed” synthetic incoherent samples that better represent or mimic incoherence. We source a diverse positive corpus for local coherence and propose six rule-based methods leveraging information from Constituency trees, Part-of-speech, semantic overlap and more, for “informed” negative sample synthesis for better representation of incoherence. We keep a straightforward training setup for local coherence modeling by fine-tuning popular transformer models, and aggregate local scores for global coherence. We evaluate on a battery of independent downstream tasks to assess the impact of improved negative sample quality. We assert that a step towards optimality for coherence modeling requires better negative sample synthesis in tandem with model improvements.- Anthology ID:
- 2024.findings-eacl.128
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2024
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Yvette Graham, Matthew Purver
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1895–1908
- Language:
- URL:
- https://aclanthology.org/2024.findings-eacl.128
- DOI:
- Cite (ACL):
- Shubhankar Singh. 2024. Jigsaw Pieces of Meaning: Modeling Discourse Coherence with Informed Negative Sample Synthesis. In Findings of the Association for Computational Linguistics: EACL 2024, pages 1895–1908, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Jigsaw Pieces of Meaning: Modeling Discourse Coherence with Informed Negative Sample Synthesis (Singh, Findings 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2024.findings-eacl.128.pdf