Improving Few-Shot Learning with Multilingual Transfer and Monte Carlo Training Set Selection

Antonis Maronikolakis, Paul O’Grady, Hinrich Schütze, Matti Lyra


Abstract
In industry settings, machine learning is an attractive tool to automatize processes. Unfortunately, annotated and high-quality data is expensive to source. This problem is exacerbated in settings spanning multiple markets and languages. Thus, developing solutions for multilingual tasks with little available data is challenging. Few-shot learning is a compelling approach when building solutions in multilingual and low-resource settings, since the method not only requires just a few training examples to achieve high performance, but is also a technique agnostic to language. Even though the technique can be applied to multilingual settings, optimizing performance is an open question. In our work we show that leveraging higher-resource, task-specific language data can boost overall performance and we propose a method to select training examples per their average performance in a Monte Carlo simulation, resulting in a training set more conducive to learning. We demonstrate the effectiveness of our methods in fashion text reviews moderation, classifying reviews as related or unrelated to the given product. We show that our methodology boosts performance in multilingual (English, French, German) settings, increasing F1 score and significantly decreasing false positives.
Anthology ID:
2023.clasp-1.1
Volume:
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
Month:
September
Year:
2023
Address:
Gothenburg, Sweden
Editors:
Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik
Venue:
CLASP
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–10
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2023.clasp-1.1/
DOI:
Bibkey:
Cite (ACL):
Antonis Maronikolakis, Paul O’Grady, Hinrich Schütze, and Matti Lyra. 2023. Improving Few-Shot Learning with Multilingual Transfer and Monte Carlo Training Set Selection. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 1–10, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Improving Few-Shot Learning with Multilingual Transfer and Monte Carlo Training Set Selection (Maronikolakis et al., CLASP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2023.clasp-1.1.pdf