stopes - Modular Machine Translation Pipelines
Pierre Andrews, Guillaume Wenzek, Kevin Heffernan, Onur Çelebi, Anna Sun, Ammar Kamran, Yingzhe Guo, Alexandre Mourachko, Holger Schwenk, Angela Fan
Abstract
Neural machine translation, as other natural language deep learning applications, is hungry for data. As research evolves, the data pipelines supporting that research evolve too, oftentimes re-implementing the same core components. Despite the potential of modular codebases, researchers have but little time to put code structure and reusability first. Unfortunately, this makes it very hard to publish clean, reproducible code to benefit a wider audience. In this paper, we motivate and describe stopes , a framework that addresses these issues while empowering scalability and versatility for research use cases. This library was a key enabler of the No Language Left Behind project, establishing new state of the art performance for a multilingual machine translation model covering 200 languages. stopes and the pipelines described are released under the MIT license at https://github.com/facebookresearch/stopes.- Anthology ID:
- 2022.emnlp-demos.26
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, UAE
- Editors:
- Wanxiang Che, Ekaterina Shutova
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 258–265
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-demos.26
- DOI:
- 10.18653/v1/2022.emnlp-demos.26
- Cite (ACL):
- Pierre Andrews, Guillaume Wenzek, Kevin Heffernan, Onur Çelebi, Anna Sun, Ammar Kamran, Yingzhe Guo, Alexandre Mourachko, Holger Schwenk, and Angela Fan. 2022. stopes - Modular Machine Translation Pipelines. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 258–265, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- stopes - Modular Machine Translation Pipelines (Andrews et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.emnlp-demos.26.pdf