Use of Domain-Specific Language Resources in Machine Translation
Sanja Štajner, Andreia Querido, Nuno Rendeiro, João António Rodrigues, António Branco
Abstract
In this paper, we address the problem of Machine Translation (MT) for a specialised domain in a language pair for which only a very small domain-specific parallel corpus is available. We conduct a series of experiments using a purely phrase-based SMT (PBSMT) system and a hybrid MT system (TectoMT), testing three different strategies to overcome the problem of the small amount of in-domain training data. Our results show that adding a small size in-domain bilingual terminology to the small in-domain training corpus leads to the best improvements of a hybrid MT system, while the PBSMT system achieves the best results by adding a combination of in-domain bilingual terminology and a larger out-of-domain corpus. We focus on qualitative human evaluation of the output of two best systems (one for each approach) and perform a systematic in-depth error analysis which revealed advantages of the hybrid MT system over the pure PBSMT system for this specific task.- Anthology ID:
- L16-1094
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 592–598
- Language:
- URL:
- https://aclanthology.org/L16-1094
- DOI:
- Cite (ACL):
- Sanja Štajner, Andreia Querido, Nuno Rendeiro, João António Rodrigues, and António Branco. 2016. Use of Domain-Specific Language Resources in Machine Translation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 592–598, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Use of Domain-Specific Language Resources in Machine Translation (Štajner et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/L16-1094.pdf
- Data
- Europarl