Textual Representations for Crosslingual Information Retrieval

Hang Zhang, Liling Tan


Abstract
In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation. Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese.
Anthology ID:
2021.ecnlp-1.14
Volume:
Proceedings of the 4th Workshop on e-Commerce and NLP
Month:
August
Year:
2021
Address:
Online
Editors:
Shervin Malmasi, Surya Kallumadi, Nicola Ueffing, Oleg Rokhlenko, Eugene Agichtein, Ido Guy
Venue:
ECNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
116–122
Language:
URL:
https://aclanthology.org/2021.ecnlp-1.14
DOI:
10.18653/v1/2021.ecnlp-1.14
Bibkey:
Cite (ACL):
Hang Zhang and Liling Tan. 2021. Textual Representations for Crosslingual Information Retrieval. In Proceedings of the 4th Workshop on e-Commerce and NLP, pages 116–122, Online. Association for Computational Linguistics.
Cite (Informal):
Textual Representations for Crosslingual Information Retrieval (Zhang & Tan, ECNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2021.ecnlp-1.14.pdf