Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models
James Fletcher, Nicholas Dehnen, Seyed Nima Tayarani Bathaie, Aijun An, Heidar Davoudi, Ron DiCarlantonio, Gary Farmaner
Abstract
To enhance a question-answering system for automotive drivers, we tackle the problem of automatic generation of icon image descriptions. The descriptions can match the driver’s query about the icon appearing on the dashboard and tell the driver what is happening so that they may take an appropriate action. We use three state-of-the-art large vision-language models to generate both visual and functional descriptions based on the icon image and its context information in the car manual. Both zero-shot and few-shot prompts are used. We create a dataset containing over 400 icons with their ground-truth descriptions and use it to evaluate model-generated descriptions across several performance metrics. Our evaluation shows that two of these models (GPT-4o and Claude 3.5) performed well on this task, while the third model (LLaVA-NEXT) performs poorly.- Anthology ID:
- 2024.emnlp-industry.83
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, US
- Editors:
- Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1107–1120
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-industry.83/
- DOI:
- 10.18653/v1/2024.emnlp-industry.83
- Cite (ACL):
- James Fletcher, Nicholas Dehnen, Seyed Nima Tayarani Bathaie, Aijun An, Heidar Davoudi, Ron DiCarlantonio, and Gary Farmaner. 2024. Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 1107–1120, Miami, Florida, US. Association for Computational Linguistics.
- Cite (Informal):
- Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models (Fletcher et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-industry.83.pdf