Abstract
Interlingual homographs are words that spell the same but possess different meanings across languages. Recognizing interlingual homographs from form-identical words generally needs linguistic knowledge and massive annotation work. In this paper, we propose an automatic interlingual homograph recognition method based on the cross-lingual word embedding similarity and co-occurrence of form-identical words in parallel sentences. We conduct experiments with various off-the-shelf language models coordinating with cross-lingual alignment operations and co-occurrence metrics on the Chinese-Japanese and English-Dutch language pairs. Experimental results demonstrate that our proposed method is able to make accurate and consistent predictions across languages.- Anthology ID:
- 2022.findings-aacl.20
- Volume:
- Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
- Month:
- November
- Year:
- 2022
- Address:
- Online only
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 211–216
- Language:
- URL:
- https://aclanthology.org/2022.findings-aacl.20
- DOI:
- Cite (ACL):
- Yi Han, Ryohei Sasano, and Koichi Takeda. 2022. Automating Interlingual Homograph Recognition with Parallel Sentences. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 211–216, Online only. Association for Computational Linguistics.
- Cite (Informal):
- Automating Interlingual Homograph Recognition with Parallel Sentences (Han et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.findings-aacl.20.pdf