Lei Duan


2025

pdf bib
Tuning Less, Prompting More: In-Context Preference Learning Pipeline for Natural Language Transformation
Shuyun Yang | Yan Zhang | Zhengmao Ye | Lei Duan | Mingjie Tang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Natural language transformation (NLT) tasks, such as machine translation (MT) and text style transfer (TST), require models to generate accurate and contextually appropriate outputs. However, existing approaches face significant challenges, including the computational costs of leveraging large pre-trained models and the limited generalization ability of fine-tuned smaller models. In this paper, we propose a novel framework that combines the flexibility of prompting with the cost-effectiveness of fine-tuning. Our method enhances smaller models by integrating In-Context Examples (ICE) from retrieval, enabling the model to better capture contextual information and align with user-level preferences. We further improve performance through hierarchical contrastive learning and dynamic preference inference mechanisms. Experimental results demonstrate that our approach outperforms existing methods, such as Supervised Fine Tuning (SFT), Direct Preference Optimization (DPO), and Contrastive Preference Optimization (CPO), across both MT and TST tasks, providing a more efficient solution for resource-constrained environments.

2024

pdf bib
Typos Correction Training against Misspellings from Text-to-Text Transformers
Guicai Xie | Ke Zhang | Lei Duan | Wei Zhang | Zeqian Huang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Dense retrieval (DR) has become a mainstream approach to information seeking, where a system is required to return relevant information to a user query. In real-life applications, typoed queries resulting from the users’ mistyping words or phonetic typing errors exist widely in search behaviors. Current dense retrievers experience a significant drop in retrieval effectiveness when they encounter typoed queries. Therefore, the search system requires the extra introduction of spell-checkers to deal with typos and then applies the DR model to perform robust matching. Herein, we argue that directly conducting the typos correction training would be beneficial to make an end-to-end retriever against misspellings. To this end, we propose a novel approach that can facilitate the incorporation of the spelling correction objective into the DR model using the encoder-decoder architecture. During typos correction training, we also develop a prompt-based augmentation technique to enhance the DR space alignment of the typoed query and its original query. Extensive experiments demonstrate that the effectiveness of our proposed end-to-end retriever significantly outperforms existing typos-aware training approaches and sophisticated training advanced retrievers. Our code is available at https://github.com/striver314/ToCoTR.

2009

pdf bib
Mining Search Engine Clickthrough Log for Matching N-gram Features
Huihsin Tseng | Longbin Chen | Fan Li | Ziming Zhuang | Lei Duan | Belle Tseng
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2000

pdf bib
An Integrated Architecture for Example-Based Machine Translation
Alexander Franz | Keiko Horiguchi | Lei Duan | Doris Ecker | Eugene Koontz | Kazami Uchida
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1997

pdf bib
Experience in WordNet Sense Tagging in the Wall Street Journal
Janyce Wiebe | Julie Maples | Lei Duan | Rebecca Bruce
Tagging Text with Lexical Semantics: Why, What, and How?