Abbreviation Across the World’s Languages and Scripts

Kyle Gorman, Brian Roark


Abstract
Detailed taxonomies for non-standard words, including abbreviations, have been developed for speech and language processing, though mostly with reference to English. In this paper, we examine abbreviation formation strategies in a diverse sample of more than 50 languages, dialects and scripts. The resulting taxonomy—and data about which strategies are attested in which languages—provides key information needed to create multilingual systems for abbreviation expansion, an essential component for speech processing and text understanding
Anthology ID:
2024.cawl-1.5
Volume:
Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Kyle Gorman, Emily Prud'hommeaux, Brian Roark, Richard Sproat
Venues:
CAWL | WS
SIG:
SIGWrit
Publisher:
ELRA and ICCL
Note:
Pages:
36–42
Language:
URL:
https://aclanthology.org/2024.cawl-1.5
DOI:
Bibkey:
Cite (ACL):
Kyle Gorman and Brian Roark. 2024. Abbreviation Across the World’s Languages and Scripts. In Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024, pages 36–42, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Abbreviation Across the World’s Languages and Scripts (Gorman & Roark, CAWL-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2024.cawl-1.5.pdf