Chunhua Liu


WAX: A New Dataset for Word Association eXplanations
Chunhua Liu | Trevor Cohn | Simon De Deyne | Lea Frermann
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Word associations are among the most common paradigms to study the human mental lexicon. While their structure and types of associations have been well studied, surprisingly little attention has been given to the question of why participants produce the observed associations. Answering this question would not only advance understanding of human cognition, but could also aid machines in learning and representing basic commonsense knowledge. This paper introduces a large, crowd-sourced data set of English word associations with explanations, labeled with high-level relation types. We present an analysis of the provided explanations, and design several tasks to probe to what extent current pre-trained language models capture the underlying relations. Our experiments show that models struggle to capture the diversity of human associations, suggesting WAX is a rich benchmark for commonsense modeling and generation.


Commonsense Knowledge in Word Associations and ConceptNet
Chunhua Liu | Trevor Cohn | Lea Frermann
Proceedings of the 25th Conference on Computational Natural Language Learning

Humans use countless basic, shared facts about the world to efficiently navigate in their environment. This commonsense knowledge is rarely communicated explicitly, however, understanding how commonsense knowledge is represented in different paradigms is important for (a) a deeper understanding of human cognition and (b) augmenting automatic reasoning systems. This paper presents an in-depth comparison of two large-scale resources of general knowledge: ConceptNet, an engineered relational database, and SWOW, a knowledge graph derived from crowd-sourced word associations. We examine the structure, overlap and differences between the two graphs, as well as the extent of situational commonsense knowledge present in the two resources. We finally show empirically that both resources improve downstream task performance on commonsense reasoning benchmarks over text-only baselines, suggesting that large-scale word association data, which have been obtained for several languages through crowd-sourcing, can be a valuable complement to curated knowledge graphs.


BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation
Ruoyao Yang | Wanying Xie | Chunhua Liu | Dong Yu
Proceedings of the 13th International Workshop on Semantic Evaluation

Researchers have been paying increasing attention to rumour evaluation due to the rapid spread of unsubstantiated rumours on social media platforms, including SemEval 2019 task 7. However, labelled data for learning rumour veracity is scarce, and labels in rumour stance data are highly disproportionate, making it challenging for a model to perform supervised-learning adequately. We propose an inference chain-based system, which fully utilizes conversation structure-based knowledge in the limited data and expand the training data in minority categories to alleviate class imbalance. Our approach obtains 12.6% improvement upon the baseline system for subtask A, ranks 1st among 21 systems in subtask A, and ranks 4th among 12 systems in subtask B.

BLCU_NLP at SemEval-2019 Task 8: A Contextual Knowledge-enhanced GPT Model for Fact Checking
Wanying Xie | Mengxi Que | Ruoyao Yang | Chunhua Liu | Dong Yu
Proceedings of the 13th International Workshop on Semantic Evaluation

Since the resources of Community Question Answering are abundant and information sharing becomes universal, it will be increasingly difficult to find factual information for questioners in massive messages. SemEval 2019 task 8 is focusing on these issues. We participate in the task and use Generative Pre-trained Transformer (OpenAI GPT) as our system. Our innovations are data extension, feature extraction, and input transformation. For contextual knowledge enhancement, we extend the training set of subtask A, use several features to improve the results of our system and adapt the input formats to be more suitable for this task. We demonstrate the effectiveness of our approaches, which achieves 81.95% of subtask A and 61.08% of subtask B in accuracy on the SemEval 2019 task 8.

BLCU-NLP at COIN-Shared Task1: Stagewise Fine-tuning BERT for Commonsense Inference in Everyday Narrations
Chunhua Liu | Dong Yu
Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing

This paper describes our system for COIN Shared Task 1: Commonsense Inference in Everyday Narrations. To inject more external knowledge to better reason over the narrative passage, question and answer, the system adopts a stagewise fine-tuning method based on pre-trained BERT model. More specifically, the first stage is to fine-tune on addi- tional machine reading comprehension dataset to learn more commonsense knowledge. The second stage is to fine-tune on target-task (MCScript2.0) with MCScript (2018) dataset assisted. Experimental results show that our system achieves significant improvements over the baseline systems with 84.2% accuracy on the official test dataset.


DEMN: Distilled-Exposition Enhanced Matching Network for Story Comprehension
Chunhua Liu | Haiou Zhang | Shan Jiang | Dong Yu
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

BLCU_NLP at SemEval-2018 Task 12: An Ensemble Model for Argument Reasoning Based on Hierarchical Attention
Meiqian Zhao | Chunhua Liu | Lu Liu | Yan Zhao | Dong Yu
Proceedings of the 12th International Workshop on Semantic Evaluation

To comprehend an argument and fill the gap between claims and reasons, it is vital to find the implicit supporting warrants behind. In this paper, we propose a hierarchical attention model to identify the right warrant which explains why the reason stands for the claim. Our model focuses not only on the similar part between warrants and other information but also on the contradictory part between two opposing warrants. In addition, we use the ensemble method for different models. Our model achieves an accuracy of 61%, ranking second in this task. Experimental results demonstrate that our model is effective to make correct choices.


Semantic Frame Labeling with Target-based Neural Model
Yukun Feng | Dong Yu | Jian Xu | Chunhua Liu
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

This paper explores the automatic learning of distributed representations of the target’s context for semantic frame labeling with target-based neural model. We constrain the whole sentence as the model’s input without feature extraction from the sentence. This is different from many previous works in which local feature extraction of the targets is widely used. This constraint makes the task harder, especially with long sentences, but also makes our model easily applicable to a range of resources and other similar tasks. We evaluate our model on several resources and get the state-of-the-art result on subtask 2 of SemEval 2015 task 15. Finally, we extend the task to word-sense disambiguation task and we also achieve a strong result in comparison to state-of-the-art work.


An Introduction to BLCU Personal Attributes Extraction System
Dong Yu | Cheng Yu | Qin Qu | Gongbo Tang | Chunhua Liu | Yue Tian | Jing Yi
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing