BERT for Monolingual and Cross-Lingual Reverse Dictionary

Hang Yan, Xiaonan Li, Xipeng Qiu, Bocao Deng


Abstract
Reverse dictionary is the task to find the proper target word given the word description. In this paper, we tried to incorporate BERT into this task. However, since BERT is based on the byte-pair-encoding (BPE) subword encoding, it is nontrivial to make BERT generate a word given the description. We propose a simple but effective method to make BERT generate the target word for this specific task. Besides, the cross-lingual reverse dictionary is the task to find the proper target word described in another language. Previous models have to keep two different word embeddings and learn to align these embeddings. Nevertheless, by using the Multilingual BERT (mBERT), we can efficiently conduct the cross-lingual reverse dictionary with one subword embedding, and the alignment between languages is not necessary. More importantly, mBERT can achieve remarkable cross-lingual reverse dictionary performance even without the parallel corpus, which means it can conduct the cross-lingual reverse dictionary with only corresponding monolingual data. Code is publicly available at https://github.com/yhcc/BertForRD.git.
Anthology ID:
2020.findings-emnlp.388
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4329–4338
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.388
DOI:
10.18653/v1/2020.findings-emnlp.388
Bibkey:
Cite (ACL):
Hang Yan, Xiaonan Li, Xipeng Qiu, and Bocao Deng. 2020. BERT for Monolingual and Cross-Lingual Reverse Dictionary. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4329–4338, Online. Association for Computational Linguistics.
Cite (Informal):
BERT for Monolingual and Cross-Lingual Reverse Dictionary (Yan et al., Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.findings-emnlp.388.pdf
Code
 yhcc/BertForRD