Multilingual and Code-Switched Sentence Ordering

Alexandre Salle, Shervin Malmasi


Abstract
Sentence Ordering (SO) is a linguistic task which requires re-ordering of shuffled sentences into a coherent paragraph. SO has downstream applications, but also serves as a semantic probe for computational models as this capability is essential for understanding narrative structures, causal and temporal relations within texts. Despite its importance, prior research has been limited to predictable English language structures and has not thoroughly addressed the complexities of multilingual and varied narrative contexts. To fill this gap, we introduce a novel and comprehensive Multilingual Sentence Ordering task that extends SO to diverse narratives across 12 languages, including challenging code-switched texts. We have developed MultiSO, a new benchmark dataset that represents these challenges. Our findings reveal that both specialized sentence ordering models and advanced Large Language Models like GPT-4 face significant challenges with this task.
Anthology ID:
2024.starsem-1.24
Volume:
Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Danushka Bollegala, Vered Shwartz
Venue:
*SEM
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
308–313
Language:
URL:
https://aclanthology.org/2024.starsem-1.24
DOI:
Bibkey:
Cite (ACL):
Alexandre Salle and Shervin Malmasi. 2024. Multilingual and Code-Switched Sentence Ordering. In Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), pages 308–313, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Multilingual and Code-Switched Sentence Ordering (Salle & Malmasi, *SEM 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.starsem-1.24.pdf