Enhancing Cross-lingual Transfer via Phonemic Transcription Integration
Hoang Nguyen, Chenwei Zhang, Tao Zhang, Eugene Rohrbaugh, Philip Yu
Abstract
Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic modality beyond the traditional orthographic transcriptions for cross-lingual transfer. Particularly, we propose unsupervised alignment objectives to capture (1) local one-to-one alignment between the two different modalities, (2) alignment via multi-modality contexts to leverage information from additional modalities, and (3) alignment via multilingual contexts where additional bilingual dictionaries are incorporated. We also release the first phonemic-orthographic alignment dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech Tagging) among the understudied but interconnected Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer and bridge the gap among CJKV languages, leading to consistent improvements on cross-lingual token-level tasks over orthographic-based multilingual PLMs.- Anthology ID:
- 2023.findings-acl.583
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9163–9175
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.583
- DOI:
- 10.18653/v1/2023.findings-acl.583
- Cite (ACL):
- Hoang Nguyen, Chenwei Zhang, Tao Zhang, Eugene Rohrbaugh, and Philip Yu. 2023. Enhancing Cross-lingual Transfer via Phonemic Transcription Integration. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9163–9175, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Enhancing Cross-lingual Transfer via Phonemic Transcription Integration (Nguyen et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2023.findings-acl.583.pdf