A Strong and Robust Baseline for Text-Image Matching

Fangyu Liu; Rongtian Ye

doi:10.18653/v1/P19-2023

A Strong and Robust Baseline for Text-Image Matching

Abstract

We review the current schemes of text-image matching models and propose improvements for both training and inference. First, we empirically show limitations of two popular loss (sum and max-margin loss) widely used in training text-image embeddings and propose a trade-off: a kNN-margin loss which 1) utilizes information from hard negatives and 2) is robust to noise as all K-most hardest samples are taken into account, tolerating pseudo negatives and outliers. Second, we advocate the use of Inverted Softmax (IS) and Cross-modal Local Scaling (CSLS) during inference to mitigate the so-called hubness problem in high-dimensional embedding space, enhancing scores of all metrics by a large margin.

Anthology ID:: P19-2023
Volume:: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:: July
Year:: 2019
Address:: Florence, Italy
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 169–176
Language:
URL:: https://aclanthology.org/P19-2023
DOI:: 10.18653/v1/P19-2023
Bibkey:
Cite (ACL):: Fangyu Liu and Rongtian Ye. 2019. A Strong and Robust Baseline for Text-Image Matching. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 169–176, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: A Strong and Robust Baseline for Text-Image Matching (Liu & Ye, ACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/P19-2023.pdf
Data: COCO, Flickr30k

PDF Search