Abstract
Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (lao-hu/hu , tiger) (Duanmu:2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG. Following an earlier work on abbreviations in English (Mahowald et al, 2013), we bring a probabilistic perspective to these questions, using both a behavioral and a corpus-based approach. We hypothesized that there is a higher probability of choosing short form in supportive context than in neutral context in Mandarin. Consistent with our prediction, our findings revealed that predictability of contexts makes effect on speakers’ long and short form choice.- Anthology ID:
- W19-8605
- Volume:
- Proceedings of the 12th International Conference on Natural Language Generation
- Month:
- October–November
- Year:
- 2019
- Address:
- Tokyo, Japan
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34–39
- Language:
- URL:
- https://aclanthology.org/W19-8605
- DOI:
- 10.18653/v1/W19-8605
- Cite (ACL):
- Lin Li, Kees van Deemter, Denis Paperno, and Jingyu Fan. 2019. Choosing between Long and Short Word Forms in Mandarin. In Proceedings of the 12th International Conference on Natural Language Generation, pages 34–39, Tokyo, Japan. Association for Computational Linguistics.
- Cite (Informal):
- Choosing between Long and Short Word Forms in Mandarin (Li et al., INLG 2019)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/W19-8605.pdf