Abstract
As input representation for each sub-word, the original BERT architecture proposes the sum of the sub-word embedding, position embedding and a segment embedding. Sub-word and position embeddings are well-known and studied, and encode lexical information and word position, respectively. In contrast, segment embeddings are less known and have so far received no attention, despite being ubiquitous in large pre-trained language models. The key idea of segment embeddings is to encode to which of the two sentences (segments) a word belongs to — the intuition is to inform the model about the separation of sentences for the next sentence prediction pre-training task. However, little is known on whether the choice of segment impacts performance. In this work, we try to fill this gap and empirically study the impact of the segment embedding during inference time for a variety of pre-trained embeddings and target tasks. We hypothesize that for single-sentence prediction tasks performance is not affected — neither in mono- nor multilingual setups — while it matters when swapping segment IDs in paired-sentence tasks. To our surprise, this is not the case. Although for classification tasks and monolingual BERT models no large differences are observed, particularly word-level multilingual prediction tasks are heavily impacted. For low-resource syntactic tasks, we observe impacts of segment embedding and multilingual BERT choice. We find that the default setting for the most used multilingual BERT model underperforms heavily, and a simple swap of the segment embeddings yields an average improvement of 2.5 points absolute LAS score for dependency parsing over 9 different treebanks.- Anthology ID:
- 2022.lrec-1.152
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 1418–1427
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.152
- DOI:
- Cite (ACL):
- Rob van der Goot, Max Müller-Eberstein, and Barbara Plank. 2022. Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embeddings. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1418–1427, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embeddings (van der Goot et al., LREC 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.lrec-1.152.pdf
- Data
- GLUE