Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval
Lingjun Zhao, Rabih Zbib, Zhuolin Jiang, Damianos Karakos, Zhongqiang Huang
Abstract
We propose a weakly supervised neural model for Ad-hoc Cross-lingual Information Retrieval (CLIR) from low-resource languages. Low resource languages often lack relevance annotations for CLIR, and when available the training data usually has limited coverage for possible queries. In this paper, we design a model which does not require relevance annotations, instead it is trained on samples extracted from translation corpora as weak supervision. This model relies on an attention mechanism to learn spans in the foreign sentence that are relevant to the query. We report experiments on two low resource languages: Swahili and Tagalog, trained on less that 100k parallel sentences each. The proposed model achieves 19 MAP points improvement compared to using CNNs for feature extraction, 12 points improvement from machine translation-based CLIR, and up to 6 points improvement compared to probabilistic CLIR models.- Anthology ID:
- D19-6129
- Volume:
- Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Editors:
- Colin Cherry, Greg Durrett, George Foster, Reza Haffari, Shahram Khadivi, Nanyun Peng, Xiang Ren, Swabha Swayamdipta
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 259–264
- Language:
- URL:
- https://aclanthology.org/D19-6129
- DOI:
- 10.18653/v1/D19-6129
- Cite (ACL):
- Lingjun Zhao, Rabih Zbib, Zhuolin Jiang, Damianos Karakos, and Zhongqiang Huang. 2019. Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 259–264, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval (Zhao et al., 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/D19-6129.pdf