Qianqian Xie
Other people with similar names: Qianqian Xie
Unverified author pages with similar names: Qianqian Xie
2026
Overview of the ClinicalSkillQA 2026 Shared Task on Continuous Perception and Procedural Reasoning in Clinical Skill Assessment
Xiyang Huang | Renxiong Wei | Yihuai Xu | Zhiyuan Chen | Keying Wu | Jiayi Xiang | Buzhou Tang | Yanqing Ye | Jinyu Chen | Cheng Zeng | Min Peng | Qianqian Xie | Sophia Ananiadou
BioNLP 2026
Xiyang Huang | Renxiong Wei | Yihuai Xu | Zhiyuan Chen | Keying Wu | Jiayi Xiang | Buzhou Tang | Yanqing Ye | Jinyu Chen | Cheng Zeng | Min Peng | Qianqian Xie | Sophia Ananiadou
BioNLP 2026
This paper presents an overview of the ClinicalSkillQA 2026 shared task, which was organized with the BioNLP Workshop at ACL 2026. The goal of this shared task is to evaluate continuous perception and procedural reasoning in clinical skill assessment by requiring systems to reconstruct the correct temporal order of shuffled clinical key frames and generate rationales grounded in clinical workflow knowledge. The benchmark contains 200 test-only instances sampled from clinical skill videos, covering three emergency-care procedures. Each instance is annotated with the ground-truth temporal order and an expert-verified rationale. A total of seven teams participated in the task, collectively making 90 submissions, with four teams providing system description papers. Systems are evaluated using Task Accuracy, Pairwise Accuracy, and BERTScore, which measure exact sequence reconstruction, local temporal consistency, and rationale quality, respectively. In this paper, we describe the task setup, dataset construction, and evaluation criteria. We further summarize the methodologies adopted by participating teams and present a comprehensive analysis of the submitted systems. The official results suggest that current models still struggle with continuous perception and procedural reasoning, especially when they must integrate visual evidence, temporal structure, and clinical workflow knowledge.
EmCellLLM: Human Peri-Implantation Embryonic Cell Annotation Based on Large Language Models
Xiaorui Guo | Zhiwei Liu | Qianqian Xie | Sophia Ananiadou
BioNLP 2026
Xiaorui Guo | Zhiwei Liu | Qianqian Xie | Sophia Ananiadou
BioNLP 2026
The advent of single-cell RNA sequencing has enabled unprecedented resolution of cell fate decisions and regulatory mechanisms during peri-implantation human embryogenesis, in which accurate cell type annotation is a fundamental prerequisite and the first step for subsequent fate and mechanism inference. Large language models (LLMs) have demonstrated outstanding performance in various fields. However, current studies mostly rely on traditional methods and have not explored the application of LLMs in the field of human embryonic cell annotation. The main reason is the lack of instruction tuning datasets and evaluation benchmarks. In this paper, we proposed EmCellLLM, the first open sourced LLMs that are specialized for human embryonic cell type prediction task based on fine-tuning Qwen3-8B with EmCell4Instruction, the first embryonic cell type prediction instruction dataset. To support LLM instruction tuning, we also build EmCellBench, the first benchmark for evaluating human embryonic cell type prediction ability of LLMs. We compare our models with a variety of LLMs on EmCellBench, where our model outperforms all other open-sourced LLMs as well as DeepSeek.
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueqing Peng | Lingfei Qian | Yan Wang | Ruoyu Xiang | Yueru He | Yang Ren | Mingyang Jiang | Vincent Jim Zhang | Yuqing Guo | Jeff Zhao | Huan He | Yi Han | Yun Feng | Yuechen Jiang | Yupeng Cao | Haohang Li | Yangyang Yu | Xiaoyu Wang | Penglei Gao | Shengyuan Lin | Keyi Wang | Shanshan Yang | Yilun Zhao | Zhiwei Liu | Peng Lu | Jerry Huang | Suyuchen Wang | Triantafillos Papadopoulos | Polydoros Giannouris | Efstathia Soufleri | Nuo Chen | Zhiyang Deng | Heming Fu | Yijia Zhao | Mingquan Lin | Meikang Qiu | Kaleb E Smith | Arman Cohan | Xiao-Yang Liu | Jimin Huang | Guojun Xiong | Alejandro Lopez-Lira | Xi Chen | Junichi Tsujii | Jian-Yun Nie | Sophia Ananiadou | Qianqian Xie
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice
Gang Hu | Yating Chen | Haiyan Ding | Wang Gao | Huang Jiajia | Min Peng | Qianqian Xie | Kun Yue
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Gang Hu | Yating Chen | Haiyan Ding | Wang Gao | Huang Jiajia | Min Peng | Qianqian Xie | Kun Yue
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Large Language Models (LLMs) excel in various general domains, they exhibit notable gaps in the highly specialized, knowledge-intensive, and legally regulated Chinese tax domain. Consequently, while tax-related benchmarks are gaining attention, many focus on isolated NLP tasks, neglecting real-world practical capabilities. To address this issue, we introduce TaxPraBen, the first dedicated benchmark for Chinese taxation practice. It combines 10 traditional application tasks, along with 3 pioneering real-world scenarios: tax risk prevention, tax inspection analysis, and tax strategy planning, sourced from 14 datasets totaling 7.3K instances. TaxPraBen features a scalable structured evaluation paradigm designed through process of "structured parsing—field alignment extraction—numerical and textual matching", enabling end-to-end tax practice assessment while being extensible to other domains. We evaluate 19 LLMs based on Bloom’s taxonomy. The results indicate significant performance disparities: all closed-source large-parameter LLMs excel, and Chinese LLMs like Qwen2.5 generally exceed multilingual LLMs, while the YaYi2 LLM, fine-tuned with some tax data, shows only limited improvement. TaxPraBen[<https://anonymous.4open.science/r/TaxPraBen/>] serves as a vital resource for advancing evaluations of LLMs in practical applications.
Human or LLM as Standardized Patients? A Comparative Study in Medical Education
Bingquan Zhang | Xiaoxiao Liu | Yuchi Wang | Zhou Lei | Qianqian Xie | Benyou Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bingquan Zhang | Xiaoxiao Liu | Yuchi Wang | Zhou Lei | Qianqian Xie | Benyou Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Standardized patients (SPs) are indispensable for clinical skills training but remain expensive and difficult to scale. Although large language model (LLM)-based virtual standardized patients (VSPs) have been proposed as an alternative, their behavior remains unstable and lacks rigorous comparison with human standardized patients. We propose EasyMED, a multi-agent VSP framework that separates case-grounded information disclosure from response generation to support stable, inquiry-conditioned patient behavior. We also introduce SPBench, a human-grounded benchmark with eight expert-defined criteria for interaction-level evaluation. Experiments show that EasyMED more closely matches human SP behavior than existing VSPs, particularly in case consistency and controlled disclosure. A four-week controlled study further demonstrates learning outcomes comparable to human SP training, with stronger early gains for novice learners and improved flexibility, psychological safety, and cost efficiency.
2025
UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models
Ruoli Gan | Duanyu Feng | Chen Zhang | Zhihang Lin | Haochen Jia | Hao Wang | Zhenyang Cai | Lei Cui | Qianqian Xie | Jimin Huang | Benyou Wang
Findings of the Association for Computational Linguistics: NAACL 2025
Ruoli Gan | Duanyu Feng | Chen Zhang | Zhihang Lin | Haochen Jia | Hao Wang | Zhenyang Cai | Lei Cui | Qianqian Xie | Jimin Huang | Benyou Wang
Findings of the Association for Computational Linguistics: NAACL 2025
Existing legal benchmarks focusing on knowledge and logic effectively evaluate LLMs on various tasks in legal domain. However, few have explored the practical application of LLMs by actual users. To further assess whether LLMs meet the specific needs of legal practitioners in real-world scenarios, we introduce UCL-Bench, a Chinese User-Centric Legal Benchmark, comprising 22 tasks across 5 distinct legal scenarios.To build the UCL-Bench, we conduct a user survey targeting legal professionals to understand their needs and challenges. Based on the survey results, we craft tasks, verified by legal professionals, and categorized them according to Bloom’s taxonomy. Each task in UCL-Bench mirrors real-world legal scenarios, and instead of relying on pre-defined answers, legal experts provide detailed answer guidance for each task, incorporating both “information” and “needs” elements to mimic the complexities of legal practice. With the guidance, we use GPT-4 as the user simulator and evaluator, enabling multi-turn dialogues as a answer guidance based evaluation framework. Our findings reveal that many recent open-source general models achieve the highest performance, suggesting that they are well-suited to address the needs of legal practitioners. However, these legal LLMs do not outperform ChatGPT, indicating a need for training strategies aligned with users’ needs. Furthermore, we find that the most effective models are able to address legal issues within fewer dialogue turns, highlighting the importance of concise and accurate responses in achieving high performance. The code and dataset are available at https://github.com/wittenberg11/UCL-bench.
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Yuzhe Yang | Yifei Zhang | Yan Hu | Yilin Guo | Ruoli Gan | Yueru He | Mingcong Lei | Xiao Zhang | Haining Wang | Qianqian Xie | Jimin Huang | Honghai Yu | Benyou Wang
Findings of the Association for Computational Linguistics: NAACL 2025
Yuzhe Yang | Yifei Zhang | Yan Hu | Yilin Guo | Ruoli Gan | Yueru He | Mingcong Lei | Xiao Zhang | Haining Wang | Qianqian Xie | Jimin Huang | Honghai Yu | Benyou Wang
Findings of the Association for Computational Linguistics: NAACL 2025
This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly, we conducted a user study involving 804 participants, collecting their feedback on financial tasks. Secondly, based on this feedback, we created our dataset that encompasses a wide range of user intents and interactions. This dataset serves as the foundation for benchmarking 11 LLMs services using the LLM-as-Judge methodology. Our results show a significant alignment between benchmark scores and human preferences, with a Pearson correlation coefficient of 0.78, confirming the effectiveness of the UCFE dataset and our evaluation approach. UCFE benchmark not only reveals the potential of LLMs in the financial domain but also provides a robust framework for assessing their performance and user satisfaction.
FLAG-TRADER: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading
Guojun Xiong | Zhiyang Deng | Keyi Wang | Yupeng Cao | Haohang Li | Yangyang Yu | Xueqing Peng | Mingquan Lin | Kaleb E Smith | Xiao-Yang Liu | Jimin Huang | Sophia Ananiadou | Qianqian Xie
Findings of the Association for Computational Linguistics: ACL 2025
Guojun Xiong | Zhiyang Deng | Keyi Wang | Yupeng Cao | Haohang Li | Yangyang Yu | Xueqing Peng | Mingquan Lin | Kaleb E Smith | Xiao-Yang Liu | Jimin Huang | Sophia Ananiadou | Qianqian Xie
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose FLAG-Trader, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.
EMPEC: A Comprehensive Benchmark for Evaluating Large Language Models Across Diverse Healthcare Professions
Zheheng Luo | Chenhan Yuan | Qianqian Xie | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2025
Zheheng Luo | Chenhan Yuan | Qianqian Xie | Sophia Ananiadou
Findings of the Association for Computational Linguistics: ACL 2025
Recent advancements in Large Language Models (LLMs) show their potential in accurately answering biomedical questions, yet current healthcare benchmarks primarily assess knowledge mastered by medical doctors, neglecting other essential professions. To address this gap, we introduce the Examinations for Medical PErsonnel in Chinese (EMPEC), a comprehensive healthcare knowledge benchmark featuring 157,803 exam questions across 124 subjects and 20 healthcare professions, including underrepresented roles like Optometrists and Audiologists. Each question is tagged for release time and source authenticity. We evaluated 17 LLMs, including proprietary and open-source models, finding that while models like GPT-4 achieved over 75% accuracy, they struggled with specialized fields and alternative medicine. Notably, we find that most medical-specific LLMs underperform their general-purpose counterparts in EMPEC, and incorporating EMPEC’s data in fine-tuning improves performance. In addition, we tested LLMs on questions released after the completion of their training to examine their ability in unseen queries. We also translated the test set into English and simplified Chinese and analyse the impact on different models. Our findings emphasize the need for broader benchmarks to assess LLM applicability in real-world healthcare, and we will provide the dataset and evaluation toolkit for future research.
Selective Preference Optimization via Token-Level Reward Function Estimation
Kailai Yang | Zhiwei Liu | Qianqian Xie | Jimin Huang | Erxue Min | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Kailai Yang | Zhiwei Liu | Qianqian Xie | Jimin Huang | Erxue Min | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recent advancements in LLM alignment leverage token-level supervisions to perform fine-grained preference optimization. However, existing token-level alignment methods either optimize on all available tokens, which can be noisy and inefficient, or perform selective training with complex and expensive key token selection strategies. In this work, we propose Selective Preference Optimization (SePO), a novel selective alignment strategy that centers on efficient key token selection without requiring strong, fine-grained supervision signals. We theoretically prove the feasibility of Direct Preference Optimization (DPO) as token-level reward function estimators, which applies to any existing alignment datasets and enables cost-efficient token selection with small-scale model sizes and training data. We then train an oracle model with DPO on the target data and utilize the estimated reward function to score all tokens within the target dataset, where only the key tokens are selected to supervise the target policy model with a contrastive objective function. Extensive experiments on three public evaluation benchmarks show that SePO significantly outperforms competitive baseline methods by only optimizing on 30% key tokens with up to 60% reduction in GPU training hours. We also explore SePO as a new paradigm for weak-to-strong generalization, showing that weak oracle models effectively supervise strong policy models with up to 16.8 more parameters. SePO also selects useful supervision signals from out-of-distribution data, alleviating the over-optimization problem.
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance
Xueqing Peng | Triantafillos Papadopoulos | Efstathia Soufleri | Polydoros Giannouris | Ruoyu Xiang | Yan Wang | Lingfei Qian | Jimin Huang | Qianqian Xie | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xueqing Peng | Triantafillos Papadopoulos | Efstathia Soufleri | Polydoros Giannouris | Ruoyu Xiang | Yan Wang | Lingfei Qian | Jimin Huang | Qianqian Xie | Sophia Ananiadou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Despite Greece’s pivotal role in the global economy, large language models (LLMs) remain underexplored for Greek financial context due to the linguistic complexity of Greek and the scarcity of domain-specific datasets. While multilingual financial NLP has revealed large performance gaps across languages, no benchmarks or LLMs have been tailored for Greek financial tasks until now. To bridge this gap, we introduce Plutus-ben, the first Greek Financial Evaluation Benchmark, and Plutus-8B, the first financial LLM fine-tuned on Greek-specific financial data. Plutus-ben addresses six core tasks: numeric/textual named entity recognition, question answering, extractive summarization, abstractive summarization, and topic classification. To support these tasks, we release four new expert-annotated Greek financial datasets and incorporate two existing resources. Our comprehensive evaluation of 24 LLMs reveals persistent challenges in Greek financial NLP, driven by linguistic complexity, domain terminology, and financial reasoning gaps. Experiment results underscore the limitations of cross-lingual transfer and the need for Greek-specific financial modeling. We publicly release Plutus-ben, Plutus-8B, and all associated datasets to promote reproducible research and advance multilingual financial NLP.
RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning Based on Emotional Information
Zhiwei Liu | Kailai Yang | Qianqian Xie | Christine de Kock | Sophia Ananiadou | Eduard Hovy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiwei Liu | Kailai Yang | Qianqian Xie | Christine de Kock | Sophia Ananiadou | Eduard Hovy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on effort- and resource-intensive fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focus on in-domain tasks and do not incorporate significant sentiment and emotion features (which we jointly call affect). In this paper, we propose RAEmoLLM, the first retrieval augmented (RAG) LLMs framework to address cross-domain misinformation detection using in-context learning based on affective information. RAEmoLLM includes three modules. (1) In the index construction module, we apply an emotional LLM to obtain affective embeddings from all domains to construct a retrieval database. (2) The retrieval module uses the database to recommend top K examples (text-label pairs) from source domain data for target domain contents. (3) These examples are adopted as few-shot demonstrations for the inference module to process the target domain content. The RAEmoLLM can effectively enhance the general performance of LLMs in cross-domain misinformation detection tasks through affect-based retrieval, without fine-tuning. We evaluate our framework on three misinformation benchmarks. Results show that RAEmoLLM achieves significant improvements compared to the other few-shot methods on three datasets, with the highest increases of 15.64%, 31.18%, and 15.73% respectively. This project is available at https://github.com/lzw108/RAEmoLLM.
INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent
Haohang Li | Yupeng Cao | Yangyang Yu | Shashidhar Reddy Javaji | Zhiyang Deng | Yueru He | Yuechen Jiang | Zining Zhu | K.p. Subbalakshmi | Jimin Huang | Lingfei Qian | Xueqing Peng | Jordan W. Suchow | Qianqian Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haohang Li | Yupeng Cao | Yangyang Yu | Shashidhar Reddy Javaji | Zhiyang Deng | Yueru He | Yuechen Jiang | Zining Zhu | K.p. Subbalakshmi | Jimin Huang | Lingfei Qian | Xueqing Peng | Jordan W. Suchow | Qianqian Xie
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce InvestorBench, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. InvestorBench enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks and cryptocurrencies, and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents’ performance across various scenarios.
2023
Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk
Jianquan Li | XiangBo Wu | Xiaokang Liu | Qianqian Xie | Prayag Tiwari | Benyou Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jianquan Li | XiangBo Wu | Xiaokang Liu | Qianqian Xie | Prayag Tiwari | Benyou Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language is the principal tool for human communication, in which humor is one of the most attractive parts. Producing natural language like humans using computers, a.k.a, Natural Language Generation (NLG), has been widely used for dialogue systems, chatbots, machine translation, as well as computer-aid creation e.g., idea generations, scriptwriting. However, the humor aspect of natural language is relatively under-investigated, especially in the age of pre-trained language models. In this work, we aim to preliminarily test *whether NLG can generate humor as humans do*. We build a largest dataset consisting of numerous **C**hinese **C**omical **C**rosstalk scripts (called **C**3 in short), which is for a popular Chinese performing art called ‘Xiangsheng’ or ‘相声’ since 1800s.We benchmark various generation approaches including training-from-scratch Seq2seq, fine-tuned middle-scale PLMs, and large-scale PLMs (with and without fine-tuning). Moreover, we also conduct a human assessment, showing that 1) *large-scale pretraining largely improves crosstalk generation quality*; and 2) *even the scripts generated from the best PLM is far from what we expect*. We conclude humor generation could be largely improved using large-scaled PLMs, but it is still in its infancy. The data and benchmarking code are publicly available in [https://github.com/anonNo2/crosstalk-generation](https://github.com/anonNo2/crosstalk-generation).
Search
Fix author
Co-authors
- Sophia Ananiadou 8
- Jimin Huang 7
- Xueqing Peng 4
- Benyou Wang 4
- Yupeng Cao 3
- Zhiyang Deng 3
- Yueru He 3
- Haohang Li 3
- Zhiwei Liu 3
- Lingfei Qian 3
- Yangyang Yu 3
- Ruoli Gan 2
- Polydoros Giannouris 2
- Yuechen Jiang 2
- Mingquan Lin 2
- Xiao-Yang Liu 2
- Triantafillos Papadopoulos 2
- Min Peng 2
- Kaleb E. Smith 2
- Efstathia Soufleri 2
- Keyi Wang 2
- Ruoyu Xiang 2
- Guojun Xiong 2
- Kailai Yang 2
- Zhenyang Cai 1
- Jinyu Chen 1
- Nuo Chen 1
- Xi Chen 1
- Yating Chen 1
- Zhiyuan Chen 1
- Arman Cohan 1
- Lei Cui 1
- Haiyan Ding 1
- Duanyu Feng 1
- Yun Feng 1
- Heming Fu 1
- Penglei Gao 1
- Wang Gao 1
- Xiaorui Guo 1
- Yilin Guo 1
- Yuqing Guo 1
- Yi Han 1
- Huan He 1
- Eduard Hovy 1
- Gang Hu 1
- Yan Hu 1
- Jerry Huang 1
- Xiyang Huang 1
- Shashidhar Reddy Javaji 1
- Haochen Jia 1
- Huang Jiajia 1
- Mingyang Jiang 1
- Mingcong Lei 1
- Zhou Lei 1
- Jianquan Li 1
- Shengyuan Lin 1
- Zhihang Lin 1
- Xiaokang Liu 1
- Xiaoxiao Liu 1
- Zhiwei Liu 1
- Alejandro Lopez-Lira 1
- Peng Lu 1
- Zheheng Luo 1
- Erxue Min 1
- Jian-Yun Nie 1
- Meikang Qiu 1
- Yang Ren 1
- K.p. Subbalakshmi 1
- Jordan W. Suchow 1
- Buzhou Tang 1
- Prayag Tiwari 1
- Jun’ichi Tsujii 1
- Haining Wang 1
- Hao Wang 1
- Suyuchen Wang 1
- Xiaoyu Wang 1
- Yan Wang 1
- Yan Wang 1
- Yuchi Wang 1
- Renxiong Wei 1
- Keying Wu 1
- Xiangbo Wu 1
- Jiayi Xiang 1
- Yihuai Xu 1
- Shanshan Yang 1
- Yuzhe Yang 1
- Yanqing Ye 1
- Honghai Yu 1
- Chenhan Yuan 1
- Kun Yue 1
- Cheng Zeng 1
- Bingquan Zhang 1
- Chen Zhang 1
- Vincent Jim Zhang 1
- Xiao Zhang 1
- Yifei Zhang 1
- Jeff Zhao 1
- Yijia Zhao 1
- Yilun Zhao 1
- Zining Zhu 1
- Christine de Kock 1