Xin Shen


2022

pdf
E-ConvRec: A Large-Scale Conversational Recommendation Dataset for E-Commerce Customer Service
Meihuizi Jia | Ruixue Liu | Peiying Wang | Yang Song | Zexi Xi | Haobin Li | Xin Shen | Meng Chen | Jinhui Pang | Xiaodong He
Proceedings of the Thirteenth Language Resources and Evaluation Conference

There has been a growing interest in developing conversational recommendation system (CRS), which provides valuable recommendations to users through conversations. Compared to the traditional recommendation, it advocates wealthier interactions and provides possibilities to obtain users’ exact preferences explicitly. Nevertheless, the corresponding research on this topic is limited due to the lack of broad-coverage dialogue corpus, especially real-world dialogue corpus. To handle this issue and facilitate our exploration, we construct E-ConvRec, an authentic Chinese dialogue dataset consisting of over 25k dialogues and 770k utterances, which contains user profile, product knowledge base (KB), and multiple sequential real conversations between users and recommenders. Next, we explore conversational recommendation in a real scene from multiple facets based on the dataset. Therefore, we particularly design three tasks: user preference recognition, dialogue management, and personalized recommendation. In the light of the three tasks, we establish baseline results on E-ConvRec to facilitate future studies.

2021

pdf
Towards Domain-Generalizable Paraphrase Identification by Avoiding the Shortcut Learning
Xin Shen | Wai Lam
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In this paper, we investigate the Domain Generalization (DG) problem for supervised Paraphrase Identification (PI). We observe that the performance of existing PI models deteriorates dramatically when tested in an out-of-distribution (OOD) domain. We conjecture that it is caused by shortcut learning, i.e., these models tend to utilize the cue words that are unique for a particular dataset or domain. To alleviate this issue and enhance the DG ability, we propose a PI framework based on Optimal Transport (OT). Our method forces the network to learn the necessary features for all the words in the input, which alleviates the shortcut learning problem. Experimental results show that our method improves the DG ability for the PI models.