Abstract
This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services. Our package provides a scalable data repository backend that offers transparent data pre-processing pipelines and automatic alignment procedures that facilitate the compilation of extensive parallel data sets from a variety of sources. Moreover, we develop a web-based interface that constitutes an intuitive frontend for end-users of the platform. The whole system can easily be distributed over virtual machines and implements a sophisticated permission system with secure connections and a flexible database for storing arbitrary metadata. Furthermore, we also provide an interface for neural machine translation that can run as a service on virtual machines, which also incorporates a connection to the data repository software.- Anthology ID:
- W19-6146
- Volume:
- Proceedings of the 22nd Nordic Conference on Computational Linguistics
- Month:
- September–October
- Year:
- 2019
- Address:
- Turku, Finland
- Editors:
- Mareike Hartmann, Barbara Plank
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- Linköping University Electronic Press
- Note:
- Pages:
- 389–394
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/W19-6146/
- DOI:
- Cite (ACL):
- Mikko Aulamo and Jörg Tiedemann. 2019. The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 389–394, Turku, Finland. Linköping University Electronic Press.
- Cite (Informal):
- The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services (Aulamo & Tiedemann, NoDaLiDa 2019)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/W19-6146.pdf