Yue Chen
2026
From Curated Data to Scalable Models: Continual Pre-training of Dense and MoE Large Language Models for Tibetan
Lei Yang | Leiyu Pan | Bojian Xiong | Renren Jin | Shaowei Zhang | Yue Chen | Ling Shi | Jiang Zhou | Junru Wu | Zhen Wang | Jianxiang Peng | Juesi Xiao | Tianyu Dong | Zhuowen Han | Zhuo Chen | Yuqi Ren | Deyi Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lei Yang | Leiyu Pan | Bojian Xiong | Renren Jin | Shaowei Zhang | Yue Chen | Ling Shi | Jiang Zhou | Junru Wu | Zhen Wang | Jianxiang Peng | Juesi Xiao | Tianyu Dong | Zhuowen Han | Zhuo Chen | Yuqi Ren | Deyi Xiong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have achieved remarkable success across a wide range of natural language processing tasks, yet their performance remains heavily biased toward high-resource languages. Tibetan, despite its cultural significance and large speaker population, is still substantially underrepresented. In this work, we present a comprehensive pipeline for advancing Tibetan language modeling through large-scale data curation and continual pre-training. We construct a 72 GB high-quality Tibetan corpus, the largest to date, and adapt Qwen2.5-7B through balanced multilingual continual pre-training with Tibetan, Chinese, and English, followed by multilingual instruction tuning. To further scale capacity efficiently, we extend the dense model to a 50B-A10B Mixture-of-Experts architecture. Due to the absence of standardized Tibetan benchmarks, we build multiple evaluation datasets via high-quality translation and human verification. Experimental results show that both dense and MoE models consistently outperform existing open-source and Tibetan-focused models of similar scale across diverse tasks. Our work advances Tibetan-centric LLM research and provides transferable insights for extending LLMs to other low-resource languages. We will release the model weights, evaluation benchmarks, and detailed data processing documentation in the follow-up.
DUET: Joint Exploration of User–Item Profiles in Recommendation System
Yue Chen | Yifei Sun | Lu Wang | Fangkai Yang | Pu Zhao | Minjie Hong | Yifei Dong | Minghua He | Nan Hu | Jianjin Zhang | Zhiwei Dai | Yuefeng Zhan | Weihao Han | Hao Sun | Qingwei Lin | Weiwei Deng | Feng Sun | Qi Zhang | Saravan Rajmohan | Dongmei Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yue Chen | Yifei Sun | Lu Wang | Fangkai Yang | Pu Zhao | Minjie Hong | Yifei Dong | Minghua He | Nan Hu | Jianjin Zhang | Zhiwei Dai | Yuefeng Zhan | Weihao Han | Hao Sun | Qingwei Lin | Weiwei Deng | Feng Sun | Qi Zhang | Saravan Rajmohan | Dongmei Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Traditional recommendation systems represent users and items as dense vectors and learn to align them in a shared latent space for relevance estimation. Recent LLM-based recommenders instead leverage natural-language representations that are easier to interpret and integrate with downstream reasoning modules. This paper studies how to construct effective textual profiles for users and items, and how to align them for recommendation.A central difficulty is that the best profile format is not known a priori: manually designed templates can be brittle and misaligned with task objectives. Moreover, generating user and item profiles independently may produce descriptions that are individually plausible yet semantically inconsistent for a specific user–item pair. We propose Duet, an interaction-aware profile generator that jointly produces user and item profiles conditioned on both user history and item evidence. Duet follows a three-stage procedure: it first turns raw histories and metadata into compact cues, then expands these cues into paired profile prompts and then generate profiles, and finally optimizes the generation policy with reinforcement learning using downstream recommendation performance as feedback. Experiments on three real-world datasets show that Duet consistently outperforms strong baselines, demonstrating the benefits of template-free profile exploration and joint user–item textual alignment. Project page: https://duet-rec.github.io/.
2025
Praetor: A Fine-Grained Generative LLM Evaluator with Instance-Level Customizable Evaluation Criteria
Yongqi Leng | Renren Jin | Yue Chen | Zhuowen Han | Ling Shi | Jianxiang Peng | Lei Yang | Juesi Xiao | Deyi Xiong
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yongqi Leng | Renren Jin | Yue Chen | Zhuowen Han | Ling Shi | Jianxiang Peng | Lei Yang | Juesi Xiao | Deyi Xiong
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the increasing capability of large language models (LLMs), LLM-as-a-judge has emerged as a new evaluation paradigm. Compared with traditional automatic and manual evaluation, LLM evaluators exhibit better interpretability and efficiency. Despite this, existing LLM evaluators suffer from limited use scenarios and poor flexibility. To mitigate these issues, we propose Praetor, a fine-grained generative LLM evaluator with instance-level customazable evaluation criteria. To train Praetor, we curate a large-scale dataset guided with a hierarchical guideline covering a wide range of tasks and instance-level evaluation criteria. We train Praetor on this dataset in a multi-task learning fashion, which enables to evaluate LLMs in either pointwise grading or pairwise comparison way and support two languages simultaneously with a high flexibility of setting evaluation criteria. Extensive experiments demonstrate that Praetor outperforms previous LLM evaluators and instruction-tuned LLMs on multiple benchmarks, setting new SOTA results. It also exhibits the potential for generating critiques as scalable feedback to further improve LLMs. Our model and related resources are released at https://github.com/tjunlp-lab/Praetor.
ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation
Minghua He | Yue Chen | Fangkai Yang | Pu Zhao | Wenjie Yin | Yu Kang | Qingwei Lin | Saravan Rajmohan | Dongmei Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Minghua He | Yue Chen | Fangkai Yang | Pu Zhao | Wenjie Yin | Yu Kang | Qingwei Lin | Saravan Rajmohan | Dongmei Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Code translation is a crucial activity in the software development and maintenance process, and researchers have recently begun to focus on using pre-trained large language models (LLMs) for code translation. However, existing LLMs only learn the contextual semantics of code during pre-training, neglecting executability information closely related to the execution state of the code, which results in unguaranteed code executability and unreliable automated code translation. To address this issue, we propose ExeCoder, an LLM specifically designed for code translation, aimed at utilizing executability representations such as functional semantics, syntax structures, and variable dependencies to enhance the capabilities of LLMs in code translation. To evaluate the effectiveness of ExeCoder, we manually enhanced the widely used benchmark TransCoder-test, resulting in a benchmark called TransCoder-test-X that serves LLMs. Evaluation of TransCoder-test-X indicates that ExeCoder achieves state-of-the-art performance in code translation, surpassing existing open-source code LLMs by over 10.88% to 38.78% and over 27.44% to 42.97% on two metrics, and even outperforms the renowned closed-source LLM GPT-4o. Code is available at https://aka.ms/execoder
2024
STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents
Yue Chen | Chen Huang | Yang Deng | Wenqiang Lei | Dingnan Jin | Jia Liu | Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2024
Yue Chen | Chen Huang | Yang Deng | Wenqiang Lei | Dingnan Jin | Jia Liu | Tat-Seng Chua
Findings of the Association for Computational Linguistics: ACL 2024
Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner.However, they still struggle to deliver promising performance on unseen domains, struggling to achieve effective domain transferability.We take the first step to investigate this issue and existing methods tend to produce one-size-fits-all strategies across diverse domains, limiting their search effectiveness.In response, we introduce a novel method, called STYLE,to achieve effective domain transferability.Our experimental results indicate that STYLE bears strong domain transferability, resulting in an average search performance improvement of 10% on four unseen domains.
2023
TRAVEL: Tag-Aware Conversational FAQ Retrieval via Reinforcement Learning
Yue Chen | Dingnan Jin | Chen Huang | Jia Liu | Wenqiang Lei
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Yue Chen | Dingnan Jin | Chen Huang | Jia Liu | Wenqiang Lei
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Efficiently retrieving FAQ questions that match users’ intent is essential for online customer service. Existing methods aim to fully utilize the dynamic conversation context to enhance the semantic association between the user query and FAQ questions. However, the conversation context contains noise, e.g., users may click questions they don’t like, leading to inaccurate semantics modeling. To tackle this, we introduce tags of FAQ questions, which can help us eliminate irrelevant information. We later integrate them into a reinforcement learning framework and minimize the negative impact of irrelevant information in the dynamic conversation context. We experimentally demonstrate our efficiency and effectiveness on conversational FAQ retrieval compared to other baselines.
Symbolization, Prompt, and Classification: A Framework for Implicit Speaker Identification in Novels
Yue Chen | Tianwei He | Hongbin Zhou | Jia-Chen Gu | Heng Lu | Zhen-Hua Ling
Findings of the Association for Computational Linguistics: EMNLP 2023
Yue Chen | Tianwei He | Hongbin Zhou | Jia-Chen Gu | Heng Lu | Zhen-Hua Ling
Findings of the Association for Computational Linguistics: EMNLP 2023
Speaker identification in novel dialogues can be widely applied to various downstream tasks, such as producing multi-speaker audiobooks and converting novels into scripts. However, existing state-of-the-art methods are limited to handling explicit narrative patterns like “Tom said, '...'", unable to thoroughly understand long-range contexts and to deal with complex cases. To this end, we propose a framework named SPC, which identifies implicit speakers in novels via symbolization, prompt, and classification. First, SPC symbolizes the mentions of candidate speakers to construct a unified label set. Then, by inserting a prompt we re-formulate speaker identification as a classification task to minimize the gap between the training objectives of speaker identification and the pre-training task. Two auxiliary tasks are also introduced in SPC to enhance long-range context understanding. Experimental results show that SPC outperforms previous methods by a large margin of 4.8% accuracy on the web novel collection, which reduces 47% of speaker identification errors, and also outperforms the emerging ChatGPT. In addition, SPC is more accurate in implicit speaker identification cases that require long-range context semantic understanding.
2022
IUCL at WASSA 2022 Shared Task: A Text-only Approach to Empathy and Emotion Detection
Yue Chen | Yingnan Ju | Sandra Kübler
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
Yue Chen | Yingnan Ju | Sandra Kübler
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis
Our system, IUCL, participated in the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification. Our main goal in building this system is to investigate how the use of demographic attributes influences performance. Our (official) results show that our text-only systems perform very competitively, ranking first in the empathy detection task, reaching an average Pearson correlation of 0.54, and second in the emotion classification task, reaching a Macro-F of 0.572. Our systems that use both text and demographic data are less competitive.
Zero-shot Cross-Linguistic Learning of Event Semantics
Malihe Alikhani | Thomas Kober | Bashar Alhafni | Yue Chen | Mert Inan | Elizabeth Nielsen | Shahab Raji | Mark Steedman | Matthew Stone
Proceedings of the 15th International Conference on Natural Language Generation
Malihe Alikhani | Thomas Kober | Bashar Alhafni | Yue Chen | Mert Inan | Elizabeth Nielsen | Shahab Raji | Mark Steedman | Matthew Stone
Proceedings of the 15th International Conference on Natural Language Generation
2019
A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling
Yue Chen | John Chen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
Yue Chen | John Chen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
In this paper we present a new method for intent recognition for complex dialog management in low resource situations. Complex dialog management is required because our target domain is real world mixed initiative food ordering between agents and their customers, where individual customer utterances may contain multiple intents and refer to food items with complex structure. For example, a customer might say “Can I get a deluxe burger with large fries and oh put extra mayo on the burger would you?” We approach this task as a multi-level sequence labeling problem, with the constraint of limited real training data. Both traditional methods like HMM, MEMM, or CRF and newer methods like DNN or BiLSTM use only homogeneous feature sets. Newer methods perform better but also require considerably more data. Previous research has done pseudo-data synthesis to obtain the required amounts of training data. We propose to use a k-NN learner with heterogeneous feature set. We used windowed word n-grams, POS tag n-grams and pre-trained word embeddings as features. For the experiments we perform a comparison between using pseudo-data and real world data. We also perform semi-supervised self-training to obtain additional labeled data, in order to better model real world scenarios. Instead of using massive pseudo-data, we show that with only less than 1% of the data size, we can achieve better result than any of the methods above by annotating real world data. We achieve labeled bracketed F-scores of 75.46, 52.84 and 49.66 for the three levels of sequence labeling where each level has a longer word span than its previous level. Overall we achieve 60.71F. In comparison, two previous systems, MEMM and DNN-ELMO, achieved 52.32 and 45.25 respectively.
Investigating Multilingual Abusive Language Detection: A Cautionary Tale
Kenneth Steimel | Daniel Dakota | Yue Chen | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Kenneth Steimel | Daniel Dakota | Yue Chen | Sandra Kübler
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Abusive language detection has received much attention in the last years, and recent approaches perform the task in a number of different languages. We investigate which factors have an effect on multilingual settings, focusing on the compatibility of data and annotations. In the current paper, we focus on English and German. Our findings show large differences in performance between the two languages. We find that the best performance is achieved by different classification algorithms. Sampling to address class imbalance issues is detrimental for German and beneficial for English. The only similarity that we find is that neither data set shows clear topics when we compare the results of topic modeling to the gold standard. Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.
2016
IUCL at SemEval-2016 Task 6: An Ensemble Model for Stance Detection in Twitter
Can Liu | Wen Li | Bradford Demarest | Yue Chen | Sara Couture | Daniel Dakota | Nikita Haduong | Noah Kaufman | Andrew Lamont | Manan Pancholi | Kenneth Steimel | Sandra Kübler
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
Can Liu | Wen Li | Bradford Demarest | Yue Chen | Sara Couture | Daniel Dakota | Nikita Haduong | Noah Kaufman | Andrew Lamont | Manan Pancholi | Kenneth Steimel | Sandra Kübler
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
2010
Search
Fix author
Co-authors
- Sandra Kübler 3
- Daniel Dakota 2
- Zhuowen Han 2
- Minghua He 2
- Chen Huang 2
- Renren Jin 2
- Dingnan Jin 2
- Wenqiang Lei 2
- Qingwei Lin 2
- Jia Liu 2
- Jianxiang Peng 2
- Saravan Rajmohan 2
- Ling Shi 2
- Kenneth Steimel 2
- Juesi Xiao 2
- Deyi Xiong (德意 熊) 2
- Fangkai Yang 2
- Dongmei Zhang 2
- Pu Zhao 2
- Bashar Alhafni 1
- Malihe Alikhani 1
- Zhuo Chen 1
- John Chen 1
- Tat-Seng Chua 1
- Sara Couture 1
- Zhiwei Dai 1
- Bradford Demarest 1
- Yang Deng 1
- Weiwei Deng 1
- Tianyu Dong 1
- Yifei Dong 1
- Jia-Chen Gu 1
- Nikita Haduong 1
- Weihao Han 1
- Tianwei He 1
- Minjie Hong 1
- Nan Hu 1
- Mert Inan 1
- Yingnan Ju 1
- Yu Kang 1
- Noah Kaufman 1
- Thomas Kober 1
- Andrew Lamont 1
- Yongqi Leng 1
- Wen Li 1
- Zhen-Hua Ling 1
- Can Liu 1
- Heng Lu 1
- Elizabeth Nielsen 1
- Leiyu Pan 1
- Manan Pancholi 1
- Shahab Raji 1
- Yuqi Ren 1
- Mark Steedman 1
- Matthew Stone 1
- Yifei Sun 1
- Hao Sun 1
- Feng Sun 1
- Zhen Wang 1
- Lu Wang 1
- Junru Wu 1
- Bojian Xiong 1
- Lei Yang 1
- Lei Yang 1
- Wenjie Yin 1
- Yuefeng Zhan 1
- Shaowei Zhang 1
- Jianjin Zhang 1
- Qi Zhang 1
- Jiang Zhou 1
- Hongbin Zhou 1