Abstract
By exploiting the cross-modal attention, cross-BERT methods have achieved state-of-the-art accuracy in cross-modal retrieval. Nevertheless, the heavy text-image interactions in the cross-BERT model are prohibitively slow for large-scale retrieval. Late-interaction methods trade off retrieval accuracy and efficiency by exploiting cross-modal interaction only in the late stage, attaining a satisfactory retrieval speed. In this work, we propose an inflating and shrinking approach to further boost the efficiency and accuracy of late-interaction methods. The inflating operation plugs several codes in the input of the encoder to exploit the text-image interactions more thoroughly for higher retrieval accuracy. Then the shrinking operation gradually reduces the text-image interactions through knowledge distilling for higher efficiency. Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted. Systematic experiments on public benchmarks demonstrate the effectiveness of our inflating and shrinking approach.- Anthology ID:
- 2021.emnlp-main.772
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9796–9809
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.772
- DOI:
- 10.18653/v1/2021.emnlp-main.772
- Cite (ACL):
- Haoliang Liu, Tan Yu, and Ping Li. 2021. Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9796–9809, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval (Liu et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/corrections-2024-07/2021.emnlp-main.772.pdf
- Data
- Visual Genome