Domain Transfer based Data Augmentation for Neural Query Translation
Liang Yao, Baosong Yang, Haibo Zhang, Boxing Chen, Weihua Luo
Abstract
Query translation (QT) serves as a critical factor in successful cross-lingual information retrieval (CLIR). Due to the lack of parallel query samples, neural-based QT models are usually optimized with synthetic data which are derived from large-scale monolingual queries. Nevertheless, such kind of pseudo corpus is mostly produced by a general-domain translation model, making it be insufficient to guide the learning of QT model. In this paper, we extend the data augmentation with a domain transfer procedure, thus to revise synthetic candidates to search-aware examples. Specifically, the domain transfer model is built upon advanced Transformer, in which layer coordination and mixed attention are exploited to speed up the refining process and leverage parameters from a pre-trained cross-lingual language model. In order to examine the effectiveness of the proposed method, we collected French-to-English and Spanish-to-English QT test sets, each of which consists of 10,000 parallel query pairs with careful manual-checking. Qualitative and quantitative analyses reveal that our model significantly outperforms strong baselines and the related domain transfer methods on both translation quality and retrieval accuracy.- Anthology ID:
- 2020.coling-main.399
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 4521–4533
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.399
- DOI:
- 10.18653/v1/2020.coling-main.399
- Cite (ACL):
- Liang Yao, Baosong Yang, Haibo Zhang, Boxing Chen, and Weihua Luo. 2020. Domain Transfer based Data Augmentation for Neural Query Translation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4521–4533, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Domain Transfer based Data Augmentation for Neural Query Translation (Yao et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2020.coling-main.399.pdf