Abstract
Machine translation is used in e-commerce to translate second-language queries into the primary language of the store, to be matched by the search system against the product catalog. However, many queries contain spelling mistakes. We first present an analysis of the spelling-robustness of a population of MT systems, quantifying how spelling variations affect MT output, the list of returned products, and ultimately user behavior. We then present two sets of practical experiments illustrating how spelling-robustness may be specifically improved. For MT, reducing the number of BPE operations significantly improves spelling-robustness in six language pairs. In end-to-end e-commerce, the inclusion of a dedicated spelling correction model, and the augmentation of that model’s training data with language-relevant phenomena, each improve robustness and consistency of search results.- Anthology ID:
- 2024.acl-long.750
- Volume:
- Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13907–13920
- Language:
- URL:
- https://aclanthology.org/2024.acl-long.750
- DOI:
- Cite (ACL):
- Greg Hanneman, Natawut Monaikul, and Taichi Nakatani. 2024. Impacts of Misspelled Queries on Translation and Product Search. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13907–13920, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Impacts of Misspelled Queries on Translation and Product Search (Hanneman et al., ACL 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.acl-long.750.pdf