Abstract
Identifying the relation between two sentences requires datasets with pairwise annotations. In many cases, these datasets contain instances that are annotated multiple times as part of different pairs. They constitute a structure that contains additional helpful information about the inter-relatedness of the text instances based on the annotations. This paper investigates how this kind of structural dataset information can be exploited during training.We propose three batch composition strategies to incorporate such information and measure their performance over 14 heterogeneous pairwise sentence classification tasks. Our results show statistically significant improvements (up to 3.9%) - independent of the pre-trained language model - for most tasks compared to baselines that follow a standard training procedure. Further, we see that even this baseline procedure can profit from having such structural information in a low-resource setting.- Anthology ID:
- 2022.findings-acl.239
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3031–3045
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.239
- DOI:
- 10.18653/v1/2022.findings-acl.239
- Cite (ACL):
- Andreas Waldis, Tilman Beck, and Iryna Gurevych. 2022. Composing Structure-Aware Batches for Pairwise Sentence Classification. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3031–3045, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Composing Structure-Aware Batches for Pairwise Sentence Classification (Waldis et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2022.findings-acl.239.pdf
- Code
- ukplab/acl2022-structure-batches
- Data
- GLUE, MultiNLI, QNLI