COAS2W: A Chinese Older-Adults Spoken-to-Written Transformation Corpus with Context Awareness

Chun Kang, Zhigu Qian, Zhen Fu, Jiaojiao Fu, Yangfan Zhou


Abstract
Spoken language from older adults often deviates from written norms due to omission, disordered syntax, constituent errors, and redundancy, limiting the usefulness of automatic transcripts in downstream tasks. We present COAS2W, a Chinese spoken-to-written corpus of 10,004 utterances from older adults, each paired with a written version, fine-grained error labels, and four-sentence context. Fine-tuned lightweight open-source models on COAS2W outperform larger closed-source models. Context ablation shows the value of multi-sentence input, and normalization improves performance on downstream translation tasks. COAS2W supports the development of inclusive, context-aware language technologies for older speakers. Our annotation convention, data, and code are publicly available at https://github.com/Springrx/COAS2W.
Anthology ID:
2025.emnlp-main.903
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17887–17906
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.903/
DOI:
10.18653/v1/2025.emnlp-main.903
Bibkey:
Cite (ACL):
Chun Kang, Zhigu Qian, Zhen Fu, Jiaojiao Fu, and Yangfan Zhou. 2025. COAS2W: A Chinese Older-Adults Spoken-to-Written Transformation Corpus with Context Awareness. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 17887–17906, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
COAS2W: A Chinese Older-Adults Spoken-to-Written Transformation Corpus with Context Awareness (Kang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.903.pdf
Checklist:
 2025.emnlp-main.903.checklist.pdf