Submodular-based In-context Example Selection for LLMs-based Machine Translation

Baijun Ji, Xiangyu Duan, Zhenyu Qiu, Tong Zhang, Junhui Li, Hao Yang, Min Zhang


Abstract
Large Language Models (LLMs) have demonstrated impressive performances across various NLP tasks with just a few prompts via in-context learning. Previous studies have emphasized the pivotal role of well-chosen examples in in-context learning, as opposed to randomly selected instances that exhibits unstable results.A successful example selection scheme depends on multiple factors, while in the context of LLMs-based machine translation, the common selection algorithms only consider the single factor, i.e., the similarity between the example source sentence and the input sentence.In this paper, we introduce a novel approach to use multiple translational factors for in-context example selection by using monotone submodular function maximization.The factors include surface/semantic similarity between examples and inputs on both source and target sides, as well as the diversity within examples.Importantly, our framework mathematically guarantees the coordination between these factors, which are different and challenging to reconcile.Additionally, our research uncovers a previously unexamined dimension: unlike other NLP tasks, the translation part of an example is also crucial, a facet disregarded in prior studies.Experiments conducted on BLOOMZ-7.1B and LLAMA2-13B, demonstrate that our approach significantly outperforms random selection and robust single-factor baselines across various machine translation tasks.
Anthology ID:
2024.lrec-main.1337
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
15398–15409
Language:
URL:
https://aclanthology.org/2024.lrec-main.1337
DOI:
Bibkey:
Cite (ACL):
Baijun Ji, Xiangyu Duan, Zhenyu Qiu, Tong Zhang, Junhui Li, Hao Yang, and Min Zhang. 2024. Submodular-based In-context Example Selection for LLMs-based Machine Translation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15398–15409, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Submodular-based In-context Example Selection for LLMs-based Machine Translation (Ji et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2024.lrec-main.1337.pdf