Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models

James Fletcher; Nicholas Dehnen; Seyed Nima Tayarani Bathaie; Aijun An; Heidar Davoudi; Ron DiCarlantonio; Gary Farmaner

doi:10.18653/v1/2024.emnlp-industry.83

Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models

James Fletcher, Nicholas Dehnen, Seyed Nima Tayarani Bathaie, Aijun An, Heidar Davoudi, Ron DiCarlantonio, Gary Farmaner

Abstract

To enhance a question-answering system for automotive drivers, we tackle the problem of automatic generation of icon image descriptions. The descriptions can match the driver’s query about the icon appearing on the dashboard and tell the driver what is happening so that they may take an appropriate action. We use three state-of-the-art large vision-language models to generate both visual and functional descriptions based on the icon image and its context information in the car manual. Both zero-shot and few-shot prompts are used. We create a dataset containing over 400 icons with their ground-truth descriptions and use it to evaluate model-generated descriptions across several performance metrics. Our evaluation shows that two of these models (GPT-4o and Claude 3.5) performed well on this task, while the third model (LLaVA-NEXT) performs poorly.

Anthology ID:: 2024.emnlp-industry.83
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2024
Address:: Miami, Florida, US
Editors:: Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1107–1120
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-industry.83/
DOI:: 10.18653/v1/2024.emnlp-industry.83
Bibkey:
Cite (ACL):: James Fletcher, Nicholas Dehnen, Seyed Nima Tayarani Bathaie, Aijun An, Heidar Davoudi, Ron DiCarlantonio, and Gary Farmaner. 2024. Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1107–1120, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):: Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models (Fletcher et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-industry.83.pdf
Poster:: 2024.emnlp-industry.83.poster.pdf

PDF Cite Search Poster Fix data