Liling Tan

2024

pdf abs
Don’t Just Translate, Summarize Too: Cross-lingual Product Title Generation in E-commerce
Bryan Zhang | Taichi Nakatani | Daniel Vidal Hussey | Stephan Walter | Liling Tan
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024

Making product titles informative and concise is vital to delighting e-commerce customers. Recent advances have successfully applied monolingual product title summarization to shorten lengthy product titles. This paper explores the cross-lingual product title generation task that summarizes and translates the source language product title to a shortened product title in the target language. Our main contributions are as follows, (i) we investigate the optimal product title length within the scope of e-commerce localization, (ii) we introduce a simple yet effective data filtering technique to train a length-aware machine translation system and compare it to a publicly available LLM, (iii) we propose an automatic approach to validate experimental results using an open-source LLM without human input and show that these evaluation results are consistent with human preferences.

2023

pdf abs
Leveraging Latent Topic Information to Improve Product Machine Translation
Bryan Zhang | Stephan Walter | Amita Misra | Liling Tan
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Meeting the expectations of e-commerce customers involves offering a seamless online shopping experience in their preferred language. To achieve this, modern e-commerce platforms rely on machine translation systems to provide multilingual product information on a large scale. However, maintaining high-quality machine translation that can keep up with the ever-expanding volume of product data remains an open challenge for industrial machine translation systems. In this context, topical clustering emerges as a valuable approach, leveraging latent signals and interpretable textual patterns to potentially enhance translation quality and facilitate industry-scale translation data discovery. This paper proposes two innovative methods: topic-based data selection and topic-signal augmentation, both utilizing latent topic clusters to improve the quality of machine translation in e-commerce. Furthermore, we present a data discovery workflow that utilizes topic clusters to effectively manage the growing multilingual product catalogs, addressing the challenges posed by their expansion.

pdf bib
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)
Liling Tan | Dmitrijs Milajevs | Geeticka Chauhan | Jeremy Gwinnup | Elijah Rippeth
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

2022

pdf abs
Evaluating Machine Translation in Cross-lingual E-Commerce Search
Hang Zhang | Liling Tan | Amita Misra
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Multilingual query localization is integral to modern e-commerce. While machine translation is widely used to translate e-commerce queries, evaluation of query translation in the context of the down-stream search task is overlooked. This study proposes a search ranking-based evaluation framework with an edit-distance based search metric to evaluate machine translation impact on cross-lingual information retrieval for e-commerce search query translation, The framework demonstrate evaluation of machine translation for e-commerce search at scale and the proposed metric is strongly associated with traditional machine translation and traditional search relevance-based metrics.

2021

pdf abs
Textual Representations for Crosslingual Information Retrieval
Hang Zhang | Liling Tan
Proceedings of the 4th Workshop on e-Commerce and NLP

In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation. Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese.

2020

pdf abs
Lexically Constrained Neural Machine Translation with Levenshtein Transformer
Raymond Hendy Susanto | Shamil Chollampatt | Liling Tan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper proposes a simple and effective algorithm for incorporating lexical constraints in neural machine translation. Previous work either required re-training existing models with the lexical constraints or incorporating them during beam search decoding with significantly higher computational overheads. Leveraging the flexibility and speed of a recently proposed Levenshtein Transformer model (Gu et al., 2019), our method injects terminology constraints at inference time without any impact on decoding speed. Our method does not require any modification to the training procedure and can be easily applied at runtime with custom dictionaries. Experiments on English-German WMT datasets show that our approach improves an unconstrained baseline and previous approaches.

pdf abs
Can Automatic Post-Editing Improve NMT?
Shamil Chollampatt | Raymond Hendy Susanto | Liling Tan | Ewa Szymanska
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing effort. APE has had notable success when used with statistical machine translation (SMT) systems but has not been as successful over neural machine translation (NMT) systems. This has raised questions on the relevance of APE task in the current scenario. However, the training of APE models has been heavily reliant on large-scale artificial corpora combined with only limited human post-edited data. We hypothesize that APE models have been underperforming in improving NMT translations due to the lack of adequate supervision. To ascertain our hypothesis, we compile a larger corpus of human post-edits of English to German NMT. We empirically show that a state-of-art neural APE model trained on this corpus can significantly improve a strong in-domain NMT system, challenging the current understanding in the field. We further investigate the effects of varying training data sizes, using artificial training data, and domain specificity for the APE task. We release this new corpus under CC BY-NC-SA 4.0 license at https://github.com/shamilcm/pedra.

2019

pdf abs
Sarah’s Participation in WAT 2019
Raymond Hendy Susanto | Ohnmar Htun | Liling Tan
Proceedings of the 6th Workshop on Asian Translation

This paper describes our MT systems’ participation in the of WAT 2019. We participated in the (i) Patent, (ii) Timely Disclosure, (iii) Newswire and (iv) Mixed-domain tasks. Our main focus is to explore how similar Transformer models perform on various tasks. We observed that for tasks with smaller datasets, our best model setup are shallower models with lesser number of attention heads. We investigated practical issues in NMT that often appear in production settings, such as coping with multilinguality and simplifying pre- and post-processing pipeline in deployment.

2018

pdf bib
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)
Eunjeong L. Park | Masato Hagiwara | Dmitrijs Milajevs | Liling Tan
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

2016

pdf
BIRA: Improved Predictive Exchange Word Clustering
Jon Dehdari | Liling Tan | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Scaling Up Word Clustering
Jon Dehdari | Liling Tan | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf
SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles
Liling Tan | Carolina Scarton | Lucia Specia | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
WOLVESAAR at SemEval-2016 Task 1: Replicating the Success of Monolingual Word Alignment and Neural Embeddings for Semantic Textual Similarity
Hannah Bechara | Rohit Gupta | Liling Tan | Constantin Orăsan | Ruslan Mitkov | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
USAAR at SemEval-2016 Task 11: Complex Word Identification with Sense Entropy and Sentence Perplexity
José Manuel Martínez Martínez | Liling Tan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for ComplexWord Identification
Marcos Zampieri | Liling Tan | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
USAAR at SemEval-2016 Task 13: Hyponym Endocentricity
Liling Tan | Francis Bond | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf abs
Faster and Lighter Phrase-based Machine Translation Baseline
Liling Tan
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper describes the SENSE machine translation system participation in the Third Workshop for Asian Translation (WAT2016). We share our best practices to build a fast and light phrase-based machine translation (PBMT) models that have comparable results to the baseline systems provided by the organizers. As Neural Machine Translation (NMT) overtakes PBMT as the state-of-the-art, deep learning and new MT practitioners might not be familiar with the PBMT paradigm and we hope that this paper will help them build a PBMT baseline system quickly and easily.

pdf bib
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Preslav Nakov | Marcos Zampieri | Liling Tan | Nikola Ljubešić | Jörg Tiedemann | Shervin Malmasi
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

Liling Tan

2024

2023

2022

2021

2020

2019

2018

2016

2015

2014

2013

2011

Co-authors

Venues