Zihan Li
2023
Chat Disentanglement: Data for New Domains and Methods for More Accurate Annotation
Sai R. Gouravajhala | Andrew M. Vernier | Yiming Shi | Zihan Li | Mark S. Ackerman | Jonathan K. Kummerfeld
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Sai R. Gouravajhala | Andrew M. Vernier | Yiming Shi | Zihan Li | Mark S. Ackerman | Jonathan K. Kummerfeld
Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association
Conversation disentanglement is the task of taking a log of intertwined conversations from a shared channel and breaking the log into individual conversations. The standard datasets for disentanglement are in a single domain and were annotated by linguistics experts with careful training for the task. In this paper, we introduce the first multi-domain dataset and a study of annotation by people without linguistics expertise or extensive training. We experiment with several variations in interfaces, conducting user studies with domain experts and crowd workers. We also test a hypothesis from prior work that link-based annotation is more accurate, finding that it actually has comparable accuracy to set-based annotation. Our new dataset will support the development of more useful systems for this task, and our experimental findings suggest that users are capable of improving the usefulness of these systems by accurately annotating their own data.
DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition
Chunkit Chan | Xin Liu | Jiayang Cheng | Zihan Li | Yangqiu Song | Ginny Wong | Simon See
Findings of the Association for Computational Linguistics: ACL 2023
Chunkit Chan | Xin Liu | Jiayang Cheng | Zihan Li | Yangqiu Song | Ginny Wong | Simon See
Findings of the Association for Computational Linguistics: ACL 2023
Implicit Discourse Relation Recognition (IDRR) is a sophisticated and challenging task to recognize the discourse relations between the arguments with the absence of discourse connectives. The sense labels for each discourse relation follow a hierarchical classification scheme in the annotation process (Prasad et al., 2008), forming a hierarchy structure. Most existing works do not well incorporate the hierarchy structure but focus on the syntax features and the prior knowledge of connectives in the manner of pure text classification. We argue that it is more effective to predict the paths inside the hierarchical tree (e.g., “Comparison -> Contrast -> however”) rather than flat labels (e.g., Contrast) or connectives (e.g., however). We propose a prompt-based path prediction method to utilize the interactive information and intrinsic senses among the hierarchy in IDRR. This is the first work that injects such structure information into pre-trained language models via prompt tuning, and the performance of our solution shows significant and consistent improvement against competitive baselines.
2019
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
Tao Yu | Rui Zhang | Heyang Er | Suyi Li | Eric Xue | Bo Pang | Xi Victoria Lin | Yi Chern Tan | Tianze Shi | Zihan Li | Youxuan Jiang | Michihiro Yasunaga | Sungrok Shim | Tao Chen | Alexander Fabbri | Zifan Li | Luyao Chen | Yuwen Zhang | Shreya Dixit | Vincent Zhang | Caiming Xiong | Richard Socher | Walter Lasecki | Dragomir Radev
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Tao Yu | Rui Zhang | Heyang Er | Suyi Li | Eric Xue | Bo Pang | Xi Victoria Lin | Yi Chern Tan | Tianze Shi | Zihan Li | Youxuan Jiang | Michihiro Yasunaga | Sungrok Shim | Tao Chen | Alexander Fabbri | Zifan Li | Luyao Chen | Yuwen Zhang | Shreya Dixit | Vincent Zhang | Caiming Xiong | Richard Socher | Walter Lasecki | Dragomir Radev
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.
Search
Fix author
Co-authors
- Mark S. Ackerman 1
- Chunkit Chan 1
- Tao Chen 1
- Luyao Chen 1
- Jiayang Cheng 1
- Shreya Dixit 1
- Heyang Er 1
- Sai R. Gouravajhala 1
- Youxuan Jiang 1
- Jonathan K. Kummerfeld 1
- Walter Lasecki 1
- Suyi Li 1
- Zifan Li 1
- Xi Victoria Lin 1
- Xin Liu 1
- Bo Pang 1
- Dragomir Radev 1
- Alexander Richard Fabbri 1
- Simon See 1
- Yiming Shi 1
- Tianze Shi 1
- Sungrok Shim 1
- Richard Socher 1
- Yangqiu Song 1
- Yi Chern Tan 1
- Andrew M. Vernier 1
- Ginny Wong 1
- Caiming Xiong 1
- Eric Xue 1
- Michihiro Yasunaga 1
- Tao Yu 1
- Rui Zhang 1
- Yuwen Zhang 1
- Vincent Zhang 1