Embedding Meta-Textual Information for Improved Learning to Rank

Toshitaka Kuwa, Shigehiko Schamoni, Stefan Riezler


Abstract
Neural approaches to learning term embeddings have led to improved computation of similarity and ranking in information retrieval (IR). So far neural representation learning has not been extended to meta-textual information that is readily available for many IR tasks, for example, patent classes in prior-art retrieval, topical information in Wikipedia articles, or product categories in e-commerce data. We present a framework that learns embeddings for meta-textual categories, and optimizes a pairwise ranking objective for improved matching based on combined embeddings of textual and meta-textual information. We show considerable gains in an experimental evaluation on cross-lingual retrieval in the Wikipedia domain for three language pairs, and in the Patent domain for one language pair. Our results emphasize that the mode of combining different types of information is crucial for model improvement.
Anthology ID:
2020.coling-main.487
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5558–5568
Language:
URL:
https://aclanthology.org/2020.coling-main.487
DOI:
10.18653/v1/2020.coling-main.487
Bibkey:
Cite (ACL):
Toshitaka Kuwa, Shigehiko Schamoni, and Stefan Riezler. 2020. Embedding Meta-Textual Information for Improved Learning to Rank. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5558–5568, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Embedding Meta-Textual Information for Improved Learning to Rank (Kuwa et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.487.pdf
Data
MetaCLIRBoostCLIR