Enhancing Cross-lingual Transfer via Phonemic Transcription Integration

Hoang Nguyen, Chenwei Zhang, Tao Zhang, Eugene Rohrbaugh, Philip Yu


Abstract
Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic modality beyond the traditional orthographic transcriptions for cross-lingual transfer. Particularly, we propose unsupervised alignment objectives to capture (1) local one-to-one alignment between the two different modalities, (2) alignment via multi-modality contexts to leverage information from additional modalities, and (3) alignment via multilingual contexts where additional bilingual dictionaries are incorporated. We also release the first phonemic-orthographic alignment dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech Tagging) among the understudied but interconnected Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer and bridge the gap among CJKV languages, leading to consistent improvements on cross-lingual token-level tasks over orthographic-based multilingual PLMs.
Anthology ID:
2023.findings-acl.583
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9163–9175
Language:
URL:
https://aclanthology.org/2023.findings-acl.583
DOI:
10.18653/v1/2023.findings-acl.583
Bibkey:
Cite (ACL):
Hoang Nguyen, Chenwei Zhang, Tao Zhang, Eugene Rohrbaugh, and Philip Yu. 2023. Enhancing Cross-lingual Transfer via Phonemic Transcription Integration. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9163–9175, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Enhancing Cross-lingual Transfer via Phonemic Transcription Integration (Nguyen et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2023.findings-acl.583.pdf