Abstract
Although several works have addressed the role of data selection to improve transfer learning for various NLP tasks, there is no consensus about its real benefits and, more generally, there is a lack of shared practices on how it can be best applied. We propose a systematic approach aimed at evaluating data selection in scenarios of increasing complexity. Specifically, we compare the case in which source and target tasks are the same while source and target domains are different, against the more challenging scenario where both tasks and domains are different. We run a number of experiments on semantic sequence tagging tasks, which are relatively less investigated in data selection, and conclude that data selection has more benefit on the scenario when the tasks are the same, while in case of different (although related) tasks from distant domains, a combination of data selection and multi-task learning is ineffective for most cases.- Anthology ID:
- 2020.insights-1.3
- Volume:
- Proceedings of the First Workshop on Insights from Negative Results in NLP
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Anna Rogers, João Sedoc, Anna Rumshisky
- Venue:
- insights
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 15–21
- Language:
- URL:
- https://aclanthology.org/2020.insights-1.3
- DOI:
- 10.18653/v1/2020.insights-1.3
- Cite (ACL):
- Samuel Louvan and Bernardo Magnini. 2020. How Far Can We Go with Data Selection? A Case Study on Semantic Sequence Tagging Tasks. In Proceedings of the First Workshop on Insights from Negative Results in NLP, pages 15–21, Online. Association for Computational Linguistics.
- Cite (Informal):
- How Far Can We Go with Data Selection? A Case Study on Semantic Sequence Tagging Tasks (Louvan & Magnini, insights 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.insights-1.3.pdf