Cross-lingual Contextualized Phrase Retrieval

Huayang Li, Deng Cai, Zhi Qu, Qu Cui, Hidetaka Kamigaito, Lemao Liu, Taro Watanabe


Abstract
Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific training data and models are the primary challenges to achieve our goal. As a result, we extract pairs of cross-lingual phrases using word alignment information automatically induced from parallel sentences. Subsequently, we train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning, which encourages the hidden representations of phrases with similar contexts and semantics to align closely. Comprehensive experiments on both the cross-lingual phrase retrieval task and a downstream task, i.e, machine translation, demonstrate the effectiveness of CCPR. On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher. When utilizing CCPR to augment the large-language-model-based translator, it achieves average gains of 0.7 and 1.5 in BERTScore for translations from X=>En and vice versa, respectively, on WMT16 dataset. We will release our code and data.
Anthology ID:
2024.findings-emnlp.383
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6562–6576
Language:
URL:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.383/
DOI:
10.18653/v1/2024.findings-emnlp.383
Bibkey:
Cite (ACL):
Huayang Li, Deng Cai, Zhi Qu, Qu Cui, Hidetaka Kamigaito, Lemao Liu, and Taro Watanabe. 2024. Cross-lingual Contextualized Phrase Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6562–6576, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Cross-lingual Contextualized Phrase Retrieval (Li et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.383.pdf