Abstract
Detailed taxonomies for non-standard words, including abbreviations, have been developed for speech and language processing, though mostly with reference to English. In this paper, we examine abbreviation formation strategies in a diverse sample of more than 50 languages, dialects and scripts. The resulting taxonomy—and data about which strategies are attested in which languages—provides key information needed to create multilingual systems for abbreviation expansion, an essential component for speech processing and text understanding- Anthology ID:
- 2024.cawl-1.5
- Volume:
- Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Kyle Gorman, Emily Prud'hommeaux, Brian Roark, Richard Sproat
- Venues:
- CAWL | WS
- SIG:
- SIGWrit
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 36–42
- Language:
- URL:
- https://aclanthology.org/2024.cawl-1.5
- DOI:
- Cite (ACL):
- Kyle Gorman and Brian Roark. 2024. Abbreviation Across the World’s Languages and Scripts. In Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024, pages 36–42, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Abbreviation Across the World’s Languages and Scripts (Gorman & Roark, CAWL-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2024.cawl-1.5.pdf