Abstract
This paper describes the SLT-CDT-UoS group’s submission to the first Special Task on Formality Control for Spoken Language Translation, part of the IWSLT 2022 Evaluation Campaign. Our efforts were split between two fronts: data engineering and altering the objective function for best hypothesis selection. We used language-independent methods to extract formal and informal sentence pairs from the provided corpora; using English as a pivot language, we propagated formality annotations to languages treated as zero-shot in the task; we also further improved formality controlling with a hypothesis re-ranking approach. On the test sets for English-to-German and English-to-Spanish, we achieved an average accuracy of .935 within the constrained setting and .995 within unconstrained setting. In a zero-shot setting for English-to-Russian and English-to-Italian, we scored average accuracy of .590 for constrained setting and .659 for unconstrained.- Anthology ID:
- 2022.iwslt-1.31
- Volume:
- Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland (in-person and online)
- Editors:
- Elizabeth Salesky, Marcello Federico, Marta Costa-jussà
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 341–350
- Language:
- URL:
- https://aclanthology.org/2022.iwslt-1.31
- DOI:
- 10.18653/v1/2022.iwslt-1.31
- Cite (ACL):
- Sebastian Vincent, Loïc Barrault, and Carolina Scarton. 2022. Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 341–350, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
- Cite (Informal):
- Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022 (Vincent et al., IWSLT 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.iwslt-1.31.pdf
- Data
- MuST-C, ParaCrawl, WikiMatrix