Liling Tan


2024

pdf
Don’t Just Translate, Summarize Too: Cross-lingual Product Title Generation in E-commerce
Bryan Zhang | Taichi Nakatani | Daniel Vidal Hussey | Stephan Walter | Liling Tan
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024

Making product titles informative and concise is vital to delighting e-commerce customers. Recent advances have successfully applied monolingual product title summarization to shorten lengthy product titles. This paper explores the cross-lingual product title generation task that summarizes and translates the source language product title to a shortened product title in the target language. Our main contributions are as follows, (i) we investigate the optimal product title length within the scope of e-commerce localization, (ii) we introduce a simple yet effective data filtering technique to train a length-aware machine translation system and compare it to a publicly available LLM, (iii) we propose an automatic approach to validate experimental results using an open-source LLM without human input and show that these evaluation results are consistent with human preferences.

2023

pdf
Leveraging Latent Topic Information to Improve Product Machine Translation
Bryan Zhang | Stephan Walter | Amita Misra | Liling Tan
Proceedings of Machine Translation Summit XIX, Vol. 2: Users Track

Meeting the expectations of e-commerce customers involves offering a seamless online shopping experience in their preferred language. To achieve this, modern e-commerce platforms rely on machine translation systems to provide multilingual product information on a large scale. However, maintaining high-quality machine translation that can keep up with the ever-expanding volume of product data remains an open challenge for industrial machine translation systems. In this context, topical clustering emerges as a valuable approach, leveraging latent signals and interpretable textual patterns to potentially enhance translation quality and facilitate industry-scale translation data discovery. This paper proposes two innovative methods: topic-based data selection and topic-signal augmentation, both utilizing latent topic clusters to improve the quality of machine translation in e-commerce. Furthermore, we present a data discovery workflow that utilizes topic clusters to effectively manage the growing multilingual product catalogs, addressing the challenges posed by their expansion.

pdf bib
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)
Liling Tan | Dmitrijs Milajevs | Geeticka Chauhan | Jeremy Gwinnup | Elijah Rippeth
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

2022

pdf
Evaluating Machine Translation in Cross-lingual E-Commerce Search
Hang Zhang | Liling Tan | Amita Misra
Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Multilingual query localization is integral to modern e-commerce. While machine translation is widely used to translate e-commerce queries, evaluation of query translation in the context of the down-stream search task is overlooked. This study proposes a search ranking-based evaluation framework with an edit-distance based search metric to evaluate machine translation impact on cross-lingual information retrieval for e-commerce search query translation, The framework demonstrate evaluation of machine translation for e-commerce search at scale and the proposed metric is strongly associated with traditional machine translation and traditional search relevance-based metrics.

2021

pdf
Textual Representations for Crosslingual Information Retrieval
Hang Zhang | Liling Tan
Proceedings of the 4th Workshop on e-Commerce and NLP

In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation. Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese.

2020

pdf
Lexically Constrained Neural Machine Translation with Levenshtein Transformer
Raymond Hendy Susanto | Shamil Chollampatt | Liling Tan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper proposes a simple and effective algorithm for incorporating lexical constraints in neural machine translation. Previous work either required re-training existing models with the lexical constraints or incorporating them during beam search decoding with significantly higher computational overheads. Leveraging the flexibility and speed of a recently proposed Levenshtein Transformer model (Gu et al., 2019), our method injects terminology constraints at inference time without any impact on decoding speed. Our method does not require any modification to the training procedure and can be easily applied at runtime with custom dictionaries. Experiments on English-German WMT datasets show that our approach improves an unconstrained baseline and previous approaches.

pdf
Can Automatic Post-Editing Improve NMT?
Shamil Chollampatt | Raymond Hendy Susanto | Liling Tan | Ewa Szymanska
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing effort. APE has had notable success when used with statistical machine translation (SMT) systems but has not been as successful over neural machine translation (NMT) systems. This has raised questions on the relevance of APE task in the current scenario. However, the training of APE models has been heavily reliant on large-scale artificial corpora combined with only limited human post-edited data. We hypothesize that APE models have been underperforming in improving NMT translations due to the lack of adequate supervision. To ascertain our hypothesis, we compile a larger corpus of human post-edits of English to German NMT. We empirically show that a state-of-art neural APE model trained on this corpus can significantly improve a strong in-domain NMT system, challenging the current understanding in the field. We further investigate the effects of varying training data sizes, using artificial training data, and domain specificity for the APE task. We release this new corpus under CC BY-NC-SA 4.0 license at https://github.com/shamilcm/pedra.

pdf bib
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)
Eunjeong L. Park | Masato Hagiwara | Dmitrijs Milajevs | Nelson F. Liu | Geeticka Chauhan | Liling Tan
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

2019

pdf
Sarah’s Participation in WAT 2019
Raymond Hendy Susanto | Ohnmar Htun | Liling Tan
Proceedings of the 6th Workshop on Asian Translation

This paper describes our MT systems’ participation in the of WAT 2019. We participated in the (i) Patent, (ii) Timely Disclosure, (iii) Newswire and (iv) Mixed-domain tasks. Our main focus is to explore how similar Transformer models perform on various tasks. We observed that for tasks with smaller datasets, our best model setup are shallower models with lesser number of attention heads. We investigated practical issues in NMT that often appear in production settings, such as coping with multilinguality and simplifying pre- and post-processing pipeline in deployment.

2018

pdf bib
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)
Eunjeong L. Park | Masato Hagiwara | Dmitrijs Milajevs | Liling Tan
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

2016

pdf
BIRA: Improved Predictive Exchange Word Clustering
Jon Dehdari | Liling Tan | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Scaling Up Word Clustering
Jon Dehdari | Liling Tan | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf
SAARSHEFF at SemEval-2016 Task 1: Semantic Textual Similarity with Machine Translation Evaluation Metrics and (eXtreme) Boosted Tree Ensembles
Liling Tan | Carolina Scarton | Lucia Specia | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
WOLVESAAR at SemEval-2016 Task 1: Replicating the Success of Monolingual Word Alignment and Neural Embeddings for Semantic Textual Similarity
Hannah Bechara | Rohit Gupta | Liling Tan | Constantin Orăsan | Ruslan Mitkov | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
USAAR at SemEval-2016 Task 11: Complex Word Identification with Sense Entropy and Sentence Perplexity
José Manuel Martínez Martínez | Liling Tan
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for ComplexWord Identification
Marcos Zampieri | Liling Tan | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
USAAR at SemEval-2016 Task 13: Hyponym Endocentricity
Liling Tan | Francis Bond | Josef van Genabith
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
Faster and Lighter Phrase-based Machine Translation Baseline
Liling Tan
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper describes the SENSE machine translation system participation in the Third Workshop for Asian Translation (WAT2016). We share our best practices to build a fast and light phrase-based machine translation (PBMT) models that have comparable results to the baseline systems provided by the organizers. As Neural Machine Translation (NMT) overtakes PBMT as the state-of-the-art, deep learning and new MT practitioners might not be familiar with the PBMT paradigm and we hope that this paper will help them build a PBMT baseline system quickly and easily.

pdf bib
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Preslav Nakov | Marcos Zampieri | Liling Tan | Nikola Ljubešić | Jörg Tiedemann | Shervin Malmasi
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

2015

pdf
USHEF and USAAR-USHEF participation in the WMT15 QE shared task
Carolina Scarton | Liling Tan | Lucia Specia
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
Predicting Machine Translation Adequacy with Document Embeddings
Mihaela Vela | Liling Tan
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
Passive and Pervasive Use of Bilingual Dictionary in Statistical Machine Translation
Liling Tan | Josef van Genabith | Francis Bond
Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)

pdf
An Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation
Liling Tan | Jon Dehdari | Josef van Genabith
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

pdf bib
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects
Preslav Nakov | Marcos Zampieri | Petya Osenova | Liling Tan | Cristina Vertan | Nikola Ljubešić | Jörg Tiedemann
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

pdf bib
Overview of the DSL Shared Task 2015
Marcos Zampieri | Liling Tan | Nikola Ljubešić | Jörg Tiedemann | Preslav Nakov
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

pdf
USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics
Liling Tan | Carolina Scarton | Lucia Specia | Josef van Genabith
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf
USAAR-CHRONOS: Crawling the Web for Temporal Annotations
Liling Tan | Noam Ordan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf
USAAR-WLV: Hypernym Generation with Deep Neural Nets
Liling Tan | Rohit Gupta | Josef van Genabith
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf
Sensible: L2 Translation Assistance by Emulating the Manual Post-Editing Process
Liling Tan | Anne-Kathrin Schumann | Jose M.M. Martinez | Francis Bond
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf
SeedLing: Building and Using a Seed corpus for the Human Language Project
Guy Emerson | Liling Tan | Susanne Fertmann | Alexis Palmer | Michaela Regneri
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf
Manawi: Using Multi-Word Expressions and Named Entities to Improve Machine Translation
Liling Tan | Santanu Pal
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects
Marcos Zampieri | Liling Tan | Nikola Ljubešić | Jörg Tiedemann
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

pdf
A Report on the DSL Shared Task 2014
Marcos Zampieri | Liling Tan | Nikola Ljubešić | Jörg Tiedemann
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

pdf
NTU-MC Toolkit: Annotating a Linguistically Diverse Corpus
Liling Tan | Francis Bond
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations

2013

pdf
XLING: Matching Query Sentences to a Parallel Corpus using Topic Models for WSD
Liling Tan | Francis Bond
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2011

pdf
Building and Annotating the Linguistically Diverse NTU-MC (NTU-Multilingual Corpus)
Liling Tan | Francis Bond
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation