Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities

Meritxell Fernández Barrera, Vladimir Popescu, Antonio Toral, Federico Gaspari, Khalid Choukri

[How to correct problems with metadata yourself]


Abstract
This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce,by highlighting extant obstacles and identifying relevant technologies to overcome them. In this sense, it firstly proposes a typology of e-commerce static and dynamic textual genres and it identifies those that may be more successfully targeted by SMT. The specific challenges concerning the automatic translation of user-generated content are discussed in detail. Secondly, the paper highlights the risk of data sparsity inherent to e-commerce and it explores the state-of-the-art strategies to achieve domain adequacy via adaptation. Thirdly, it proposes a robust workflow for the development of SMT systems adapted to the e-commerce domain by relying on inexpensive methods. Given the scarcity of user-generated language corpora for most language pairs, the paper proposes to obtain monolingual target-language data to train language models and aligned parallel corpora to tune and evaluate MT systems by means of crowdsourcing.
Anthology ID:
L16-1721
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4550–4556
Language:
URL:
https://aclanthology.org/L16-1721
DOI:
Bibkey:
Cite (ACL):
Meritxell Fernández Barrera, Vladimir Popescu, Antonio Toral, Federico Gaspari, and Khalid Choukri. 2016. Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4550–4556, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities (Barrera et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/L16-1721.pdf