A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content
Miguel Menezes, Vera Cabarrão, Pedro Mota, Helena Moniz, Alon Lavie
Abstract
This paper describes the research developed at Unbabel, a Portuguese Machine-translation start-up, that combines MT with human post-edition and focuses strictly on customer service content. We aim to contribute to furthering MT quality and good-practices by exposing the importance of having a continuously-in-development robust Named Entity Recognition system compliant with General Data Protection Regulation (GDPR). Moreover, we have tested semiautomatic strategies that support and enhance the creation of Named Entities gold standards to allow a more seamless implementation of Multilingual Named Entities Recognition Systems. The project described in this paper is the result of a shared work between Unbabel ́s linguists and Unbabel ́s AI engineering team, matured over a year. The project should, also, be taken as a statement of multidisciplinary, proving and validating the much-needed articulation between the different scientific fields that compose and characterize the area of Natural Language Processing (NLP).- Anthology ID:
- 2022.eamt-1.24
- Volume:
- Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
- Month:
- June
- Year:
- 2022
- Address:
- Ghent, Belgium
- Editors:
- Helena Moniz, Lieve Macken, Andrew Rufener, Loïc Barrault, Marta R. Costa-jussà, Christophe Declercq, Maarit Koponen, Ellie Kemp, Spyridon Pilos, Mikel L. Forcada, Carolina Scarton, Joachim Van den Bogaert, Joke Daems, Arda Tezcan, Bram Vanroy, Margot Fonteyne
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 211–219
- Language:
- URL:
- https://aclanthology.org/2022.eamt-1.24
- DOI:
- Cite (ACL):
- Miguel Menezes, Vera Cabarrão, Pedro Mota, Helena Moniz, and Alon Lavie. 2022. A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content. In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 211–219, Ghent, Belgium. European Association for Machine Translation.
- Cite (Informal):
- A Case Study on the Importance of Named Entities in a Machine Translation Pipeline for Customer Support Content (Menezes et al., EAMT 2022)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2022.eamt-1.24.pdf