2024
pdf
abs
Knowledge Crosswords: Geometric Knowledge Reasoning with Large Language Models
Wenxuan Ding
|
Shangbin Feng
|
Yuhan Liu
|
Zhaoxuan Tan
|
Vidhisha Balachandran
|
Tianxing He
|
Yulia Tsvetkov
Findings of the Association for Computational Linguistics: ACL 2024
We propose Knowledge Crosswords, a geometric knowledge reasoning benchmark consisting of incomplete knowledge networks bounded by structured factual constraints, where LLMs are tasked with inferring the missing facts to meet all constraints. The novel setting of geometric knowledge reasoning necessitates new LM abilities beyond existing atomic/linear multi-hop QA, such as backtracking, verifying facts and constraints, reasoning with uncertainty, and more. Knowledge Crosswords contains 2,101 individual problems, covering diverse knowledge domains, and is further divided into three difficulty levels. We conduct extensive experiments to evaluate existing LLMs and approaches on Knowledge Crosswords. Results demonstrate that baseline approaches struggle with larger knowledge networks and semantically-equivalent entity distractors. In light of their limitations, we propose two new approaches, Staged Prompting and Verify-All, to augment LLMs’ abilities for error-aware backtracking and constraint verification. Our Verify-All significantly outperforms prior methods and is more robust towards problems in the hard subset. Further analysis shows that geometric knowledge reasoning poses new challenges to LLMs’ knowledge abilities, particularly in robustness towards varying option orders, complex structural constraints in knowledge networks, “none of the above” scenarios, and more.
pdf
abs
P3Sum: Preserving Author’s Perspective in News Summarization with Diffusion Language Models
Yuhan Liu
|
Shangbin Feng
|
Xiaochuang Han
|
Vidhisha Balachandran
|
Chan Young Park
|
Sachin Kumar
|
Yulia Tsvetkov
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
In this work, we take a first step towards designing summarization systems that are faithful to the author’s intent, not only the semantic content of the article. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries, misrepresenting the intent and perspectives of the news authors. We thus propose P3Sum, a diffusion model-based summarization approach controlled by political perspective classifiers. In P3Sum, the political leaning of a generated summary is iteratively evaluated at each decoding step, and any drift from the article’s original stance incurs a loss back-propagated to the embedding layers, steering the political stance of the summary at inference time. Extensive experiments on three news summarization datasets demonstrate that P3Sum outperforms state-of-the-art summarization systems and large language models by up to 13.7% in terms of the success rate of stance preservation, with competitive performance on standard metrics of summarization quality. Our findings present a first analysis of preservation of pragmatic features in summarization, highlight the lacunae in existing summarization models—that even state-of-the-art models often struggle to preserve author’s intents—and develop new summarization systems that are more faithful to author’s perspectives.
pdf
abs
IAD: In-Context Learning Ability Decoupler of Large Language Models in Meta-Training
Yuhan Liu
|
Xiuying Chen
|
Gao Xing
|
Ji Zhang
|
Rui Yan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Large Language Models (LLMs) exhibit remarkable In-Context Learning (ICL) ability, where the model learns tasks from prompts consisting of input-output examples. However, the pre-training objectives of LLMs often misalign with ICL objectives. They’re mainly pre-trained with methods like masked language modeling and next-sentence prediction. On the other hand, ICL leverages example pairs to guide the model in generating task-aware responses such as text classification and question-answering tasks. The basic pre-training task-related capabilities can sometimes overshadow or conflict with task-specific subtleties required in ICL. To address this, we propose an In-context learning Ability Decoupler (IAD). The model aims to separate the ICL ability from the general ability of LLMs in the meta-training phase, where the ICL-related parameters are separately tuned to adapt for ICL tasks. Concretely, we first identify the parameters that are suitable for ICL by transference-driven gradient importance. We then propose a new max-margin loss to emphasize the separation of the general and ICL abilities. The loss is defined as the difference between the output of ICL and the original LLM, aiming to prevent the overconfidence of the LLM. By meta-training these ICL-related parameters with max-margin loss, we enable the model to learn and adapt to new tasks with limited data effectively. Experimental results show that IAD’s capability yields state-of-the-art performance on benchmark datasets by utilizing only 30% of the model’s parameters. Ablation study and detailed analysis prove the separation of the two abilities.
2023
pdf
abs
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
Shangbin Feng
|
Chan Young Park
|
Yuhan Liu
|
Yulia Tsvetkov
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language models (LMs) are pretrained on diverse data sources—news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure media biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings which reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and media biases into misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.
2021
pdf
abs
Improving Empathetic Response Generation by Recognizing Emotion Cause in Conversations
Jun Gao
|
Yuhan Liu
|
Haolin Deng
|
Wei Wang
|
Yu Cao
|
Jiachen Du
|
Ruifeng Xu
Findings of the Association for Computational Linguistics: EMNLP 2021
Current approaches to empathetic response generation focus on learning a model to predict an emotion label and generate a response based on this label and have achieved promising results. However, the emotion cause, an essential factor for empathetic responding, is ignored. The emotion cause is a stimulus for human emotions. Recognizing the emotion cause is helpful to better understand human emotions so as to generate more empathetic responses. To this end, we propose a novel framework that improves empathetic response generation by recognizing emotion cause in conversations. Specifically, an emotion reasoner is designed to predict a context emotion label and a sequence of emotion cause-oriented labels, which indicate whether the word is related to the emotion cause. Then we devise both hard and soft gated attention mechanisms to incorporate the emotion cause into response generation. Experiments show that incorporating emotion cause information improves the performance of the model on both emotion recognition and response generation.
2020
pdf
abs
Target-based Sentiment Annotation in Chinese Financial News
Chaofa Yuan
|
Yuhan Liu
|
Rongdi Yin
|
Jun Zhang
|
Qinling Zhu
|
Ruibin Mao
|
Ruifeng Xu
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper presents the design and construction of a large-scale target-based sentiment annotation corpus on Chinese financial news text. Different from the most existing paragraph/document-based annotation corpus, in this study, target-based fine-grained sentiment annotation is performed. The companies, brands and other financial entities are regarded as the targets. The clause reflecting the profitability, loss or other business status of financial entities is regarded as the sentiment expression for determining the polarity. Based on high quality annotation guideline and effective quality control strategy, a corpus with 8,314 target-level sentiment annotation is constructed on 6,336 paragraphs from Chinese financial news text. Based on this corpus, several state-of-the-art sentiment analysis models are evaluated.
pdf
abs
基于循环交互注意力网络的问答立场分析(A Recurrent Interactive Attention Network for Answer Stance Analysis)
Wangda Luo (骆旺达)
|
Yuhan Liu (刘宇瀚)
|
Bin Liang (梁斌)
|
Ruifeng Xu (徐睿峰)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
针对问答立场任务中,现有方法难以提取问答文本间的依赖关系问题,本文提出一种基于循环交互注意力(Recurrent Interactive Attention, RIA)网络的问答立场分析方法。该方法通过模仿人类阅读理解时的思维方式,基于交互注意力机制和循环迭代方法,有效地从问题和答案的相互联系中挖掘问答文本的立场信息。此外,该方法将问题进行陈述化表示,有效地解决疑问句表述下问题文本无法明确表达自身立场的问题。实验结果表明,本文方法取得了比现有模型方法更好的效果,同时证明该方法能有效拟合问答立场分析任务中的问答对依赖关系。
2018
pdf
abs
ISCLAB at SemEval-2018 Task 1: UIR-Miner for Affect in Tweets
Meng Li
|
Zhenyuan Dong
|
Zhihao Fan
|
Kongming Meng
|
Jinghua Cao
|
Guanqi Ding
|
Yuhan Liu
|
Jiawei Shan
|
Binyang Li
Proceedings of the 12th International Workshop on Semantic Evaluation
This paper presents a UIR-Miner system for emotion and sentiment analysis evaluation in Twitter in SemEval 2018. Our system consists of three main modules: preprocessing module, stacking module to solve the intensity prediction of emotion and sentiment, LSTM network module to solve multi-label classification, and the hierarchical attention network module for solving emotion and sentiment classification problem. According to the metrics of SemEval 2018, our system gets the final scores of 0.636, 0.531, 0.731, 0.708, and 0.408 on 5 subtasks, respectively.