Kyumin Lee


2021

pdf
Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection
Nguyen Vo | Kyumin Lee
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The widespread of fake news and misinformation in various domains ranging from politics, economics to public health has posed an urgent need to automatically fact-check information. A recent trend in fake news detection is to utilize evidence from external sources. However, existing evidence-aware fake news detection methods focused on either only word-level attention or evidence-level attention, which may result in suboptimal performance. In this paper, we propose a Hierarchical Multi-head Attentive Network to fact-check textual claims. Our model jointly combines multi-head word-level attention and multi-head document-level attention, which aid explanation in both word-level and evidence-level. Experiments on two real-word datasets show that our model outperforms seven state-of-the-art baselines. Improvements over baselines are from 6% to 18%. Our source code and datasets are released at https://github.com/nguyenvo09/EACL2021.

2020

pdf
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
Thanh Tran | Yifan Hu | Changwei Hu | Kevin Yen | Fei Tan | Kyumin Lee | Se Rim Park
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present our HABERTOR model for detecting hatespeech in large scale user-generated content. Inspired by the recent success of the BERT model, we propose several modifications to BERT to enhance the performance on the downstream hatespeech classification task. HABERTOR inherits BERT’s architecture, but is different in four aspects: (i) it generates its own vocabularies and is pre-trained from the scratch using the largest scale hatespeech dataset; (ii) it consists of Quaternion-based factorized components, resulting in a much smaller number of parameters, faster training and inferencing, as well as less memory usage; (iii) it uses our proposed multi-source ensemble heads with a pooling layer for separate input sources, to further enhance its effectiveness; and (iv) it uses a regularized adversarial training with our proposed fine-grained and adaptive noise magnitude to enhance its robustness. Through experiments on the large-scale real-world hatespeech dataset with 1.4M annotated comments, we show that HABERTOR works better than 15 state-of-the-art hatespeech detection methods, including fine-tuning Language Models. In particular, comparing with BERT, our HABERTOR is 4 5 times faster in the training/inferencing phase, uses less than 1/3 of the memory, and has better performance, even though we pre-train it by using less than 1% of the number of words. Our generalizability analysis shows that HABERTOR transfers well to other unseen hatespeech datasets and is a more efficient and effective alternative to BERT for the hatespeech classification.

pdf
Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News
Nguyen Vo | Kyumin Lee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Although many fact-checking systems have been developed in academia and industry, fake news is still proliferating on social media. These systems mostly focus on fact-checking but usually neglect online users who are the main drivers of the spread of misinformation. How can we use fact-checked information to improve users’ consciousness of fake news to which they are exposed? How can we stop users from spreading fake news? To tackle these questions, we propose a novel framework to search for fact-checking articles, which address the content of an original tweet (that may contain misinformation) posted by online users. The search can directly warn fake news posters and online users (e.g. the posters’ followers) about misinformation, discourage them from spreading fake news, and scale up verified content on social media. Our framework uses both text and images to search for fact-checking articles, and achieves promising results on real-world datasets. Our code and datasets are released at https://github.com/nguyenvo09/EMNLP2020.

pdf
Hierarchical Evidence Set Modeling for Automated Fact Extraction and Verification
Shyam Subramanian | Kyumin Lee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Automated fact extraction and verification is a challenging task that involves finding relevant evidence sentences from a reliable corpus to verify the truthfulness of a claim. Existing models either (i) concatenate all the evidence sentences, leading to the inclusion of redundant and noisy information; or (ii) process each claim-evidence sentence pair separately and aggregate all of them later, missing the early combination of related sentences for more accurate claim verification. Unlike the prior works, in this paper, we propose Hierarchical Evidence Set Modeling (HESM), a framework to extract evidence sets (each of which may contain multiple evidence sentences), and verify a claim to be supported, refuted or not enough info, by encoding and attending the claim and evidence sets at different levels of hierarchy. Our experimental results show that HESM outperforms 7 state-of-the-art methods for fact extraction and claim verification. Our source code is available at https://github.com/ShyamSubramanian/HESM.