Minh-Quang Pham
2023
Gradient-based Gradual Pruning for Language-Specific Multilingual Neural Machine Translation
Dan He
|
Minh-Quang Pham
|
Thanh-Le Ha
|
Marco Turchi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Multilingual neural machine translation (MNMT) offers the convenience of translating between multiple languages with a single model. However, MNMT often suffers from performance degradation in high-resource languages compared to bilingual counterparts. This degradation is commonly attributed to parameter interference, which occurs when parameters are fully shared across all language pairs. In this work, to tackle this issue we propose a gradient-based gradual pruning technique for MNMT. Our approach aims to identify an optimal sub-network for each language pair within the multilingual model by leveraging gradient-based information as pruning criterion and gradually increasing the pruning ratio as schedule. Our approach allows for partial parameter sharing across language pairs to alleviate interference, and each pair preserves its unique parameters to capture language-specific information. Comprehensive experiments on IWSLT and WMT datasets show that our approach yields a notable performance gain on both datasets.
Select, Prompt, Filter: Distilling Large Language Models for Summarizing Conversations
Minh-Quang Pham
|
Sathish Indurthi
|
Shamil Chollampatt
|
Marco Turchi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) like ChatGPT can be expensive to train, deploy, and use for specific natural language generation tasks such as text summarization and for certain domains. A promising alternative is to fine-tune relatively smaller language models (LMs) on a particular task using high-quality, in-domain datasets. However, it can be prohibitively expensive to get such high-quality training data. This issue has been mitigated by generating weakly supervised data via knowledge distillation (KD) of LLMs. We propose a three-step approach to distill ChatGPT and fine-tune smaller LMs for summarizing forum conversations. More specifically, we design a method to selectively sample a large unannotated corpus of forum conversation using a semantic similarity metric. Then, we use the same metric to retrieve suitable prompts for ChatGPT from a small annotated validation set in the same domain. The generated dataset is then filtered to remove low-quality instances. Our proposed select-prompt-filter KD approach leads to significant improvements of up to 6.6 ROUGE-2 score by leveraging sufficient in-domain pseudo-labeled data over a standard KD approach given the same size of training data.
2022
Multi-Domain Adaptation in Neural Machine Translation with Dynamic Sampling Strategies
Minh-Quang Pham
|
Josep Crego
|
François Yvon
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Building effective Neural Machine Translation models often implies accommodating diverse sets of heterogeneous data so as to optimize performance for the domain(s) of interest. Such multi-source / multi-domain adaptation problems are typically approached through instance selection or reweighting strategies, based on a static assessment of the relevance of training instances with respect to the task at hand. In this paper, we study dynamic data selection strategies that are able to automatically re-evaluate the usefulness of data samples and to evolve a data selection policy in the course of training. Based on the results of multiple experiments, we show that such methods constitute a generic framework to automatically and effectively handle a variety of real-world situations, from multi-source domain adaptation to multi-domain learning and unsupervised domain adaptation.
Search
Fix author
Co-authors
- Marco Turchi 2
- Shamil Chollampatt 1
- Josep M. Crego 1
- Thanh-Le Ha 1
- Dan He 1
- show all...