Abstract
In this paper, we describe our system under the team name Brotherhood for the English-to-Lowres Multi-Modal Translation Task. We participate in the multi-modal translation tasks for English-Hindi, English-Hausa, English-Bengali, and English-Malayalam language pairs. We present a method leveraging multi-modal Large Language Models (LLMs), specifically GPT-4o and Claude 3.5 Sonnet, to enhance cross-lingual image captioning without traditional training or fine-tuning.Our approach utilizes instruction-tuned prompting to generate rich, contextual conversations about cropped images, using their English captions as additional context. These synthetic conversations are then translated into the target languages. Finally, we employ a weighted prompting strategy, balancing the original English caption with the translated conversation to generate captions in the target language.This method achieved competitive results, scoring 37.90 BLEU on the English-Hindi Challenge Set and ranking first and second for English-Hausa on the Challenge and Evaluation Leaderboards, respectively. We conduct additional experiments on a subset of 250 images, exploring the trade-offs between BLEU scores and semantic similarity across various weighting schemes.- Anthology ID:
- 2024.wmt-1.81
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 852–861
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.81/
- DOI:
- 10.18653/v1/2024.wmt-1.81
- Cite (ACL):
- Siddharth Betala and Ishan Chokshi. 2024. Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning. In Proceedings of the Ninth Conference on Machine Translation, pages 852–861, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning (Betala & Chokshi, WMT 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.81.pdf