Quality Estimation for Automatically Generated Titles of eCommerce Browse Pages

Nicola Ueffing, José G. C. de Souza, Gregor Leusch


Abstract
At eBay, we are automatically generating a large amount of natural language titles for eCommerce browse pages using machine translation (MT) technology. While automatic approaches can generate millions of titles very fast, they are prone to errors. We therefore develop quality estimation (QE) methods which can automatically detect titles with low quality in order to prevent them from going live. In this paper, we present different approaches: The first one is a Random Forest (RF) model that explores hand-crafted, robust features, which are a mix of established features commonly used in Machine Translation Quality Estimation (MTQE) and new features developed specifically for our task. The second model is based on Siamese Networks (SNs) which embed the metadata input sequence and the generated title in the same space and do not require hand-crafted features at all. We thoroughly evaluate and compare those approaches on in-house data. While the RF models are competitive for scenarios with smaller amounts of training data and somewhat more robust, they are clearly outperformed by the SN models when the amount of training data is larger.
Anthology ID:
N18-3007
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)
Month:
June
Year:
2018
Address:
New Orleans - Louisiana
Editors:
Srinivas Bangalore, Jennifer Chu-Carroll, Yunyao Li
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–59
Language:
URL:
https://aclanthology.org/N18-3007
DOI:
10.18653/v1/N18-3007
Bibkey:
Cite (ACL):
Nicola Ueffing, José G. C. de Souza, and Gregor Leusch. 2018. Quality Estimation for Automatically Generated Titles of eCommerce Browse Pages. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pages 52–59, New Orleans - Louisiana. Association for Computational Linguistics.
Cite (Informal):
Quality Estimation for Automatically Generated Titles of eCommerce Browse Pages (Ueffing et al., NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/N18-3007.pdf