Abstract
In neural dependency parsing, as well as in the broader field of NLP, domain adaptation remains a challenging problem. When adapting a parser to a target domain, there is a fundamental tension between the need to make use of out-of-domain data and the need to ensure that syntactic characteristic of the target domain are learned. In this work we explore a way to balance these two competing concerns, namely using domain-weighted batch sampling, which allows us to use all available training data, while controlling the probability of sampling in- and out-of-domain data when constructing training batches. We conduct experiments using ten natural language domains and find that domain-weighted batch sampling yields substantial performance improvements in all ten domains compared to a baseline of conventional randomized batch sampling.- Anthology ID:
- 2024.mwe-1.24
- Volume:
- Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Archna Bhatia, Gosse Bouma, A. Seza Dogruoz, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Joakim Nivre, Alexandre Rademaker
- Venues:
- MWE | UDW | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 198–206
- Language:
- URL:
- https://aclanthology.org/2024.mwe-1.24
- DOI:
- Cite (ACL):
- Jacob Striebel, Daniel Dakota, and Sandra Kübler. 2024. Domain-Weighted Batch Sampling for Neural Dependency Parsing. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, pages 198–206, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Domain-Weighted Batch Sampling for Neural Dependency Parsing (Striebel et al., MWE-UDW-WS 2024)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2024.mwe-1.24.pdf