Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Kosuke Nishida; Kyosuke Nishida; Shuichi Nishioka

doi:10.18653/v1/2022.findings-naacl.106

Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

Kosuke Nishida, Kyosuke Nishida, Shuichi Nishioka

Abstract

Humans can obtain the knowledge of novel visual concepts from language descriptions, and we thus use the few-shot image classification task to investigate whether a machine learning model can have this capability. Our proposed model, LIDE (Learning from Image and DEscription), has a text decoder to generate the descriptions and a text encoder to obtain the text representations of machine- or user-generated descriptions. We confirmed that LIDE with machine-generated descriptions outperformed baseline models. Moreover, the performance was improved further with high-quality user-generated descriptions. The generated descriptions can be viewed as the explanations of the model’s predictions, and we observed that such explanations were consistent with prediction results. We also investigated why the language description improves the few-shot image classification performance by comparing the image representations and the text representations in the feature spaces.

Anthology ID:: 2022.findings-naacl.106
Volume:: Findings of the Association for Computational Linguistics: NAACL 2022
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1421–1430
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.findings-naacl.106/
DOI:: 10.18653/v1/2022.findings-naacl.106
Bibkey:
Cite (ACL):: Kosuke Nishida, Kyosuke Nishida, and Shuichi Nishioka. 2022. Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1421–1430, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions (Nishida et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.findings-naacl.106.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.findings-naacl.106.mp4
Data: CUB-200-2011

PDF Cite Search Video Fix data