Haotian Zhu


2025

pdf bib
Tapping into Social Media in Crisis: A Survey
William D. Lewis | Haotian Zhu | Keaton Strawn | Fei Xia
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)

When a crisis hits, people often turn to social media to ask for help, offer help, find out how others are doing, and decide what they should do. The growth of social media use during crises has been helpful to aid providers as well, giving them a nearly immediate read of the on-the-ground situation that they might not otherwise have. The amount of crisis-related content posted to social media over the past two decades has been explosive, which, in turn, has been a boon to Language Technology (LT) researchers. In this study, we conducted a systematic survey of 355 papers published in the past five years to better understand the expanding growth of LT as it is applied to crisis content, specifically focusing on corpora built over crisis social media data as well as systems and applications that have been developed on this content. We highlight the challenges and possible future directions of research in this space. Our goal is to engender interest in the LT field writ large, in particular in an area of study that can have dramatic impacts on people’s lives. Indeed, the use of LT in crisis response has already been shown to save people’s lives.

2024

pdf bib
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics
Zhiwen You | Kanyao Han | Haotian Zhu | Bertram Ludaescher | Jana Diesner
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models for a variety of tasks, including text classification. For multi-class classification tasks, prompt-based fine-tuning under low-resource scenarios has resulted in performance levels comparable to those of fully fine-tuning methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space to the class space, to solve the classification problem as a masked language modeling task. However, cross-domain and fine-grained prompt-based fine-tuning with an automatically enriched verbalizer remains unexplored, mainly due to the difficulty and costs of manually selecting domain label terms for the verbalizer, which requires humans with domain expertise. To address this challenge, we introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. To this end, we select semantically correlated and domain-specific label terms within the context of scientific literature for verbalizer augmentation. Furthermore, we propose a new verbalization strategy that uses correlation scores as additional weights to enhance the prediction performance of the language model during model tuning. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings, especially in classifying fine-grained and emerging scientific topics.

pdf bib
Disagreeable, Slovenly, Honest and Un-named Women? Investigating Gender Bias in English Educational Resources by Extending Existing Gender Bias Taxonomies
Haotian Zhu | Kexin Gao | Fei Xia | Mari Ostendorf
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Gender bias has been extensively studied in both the educational field and the Natural Language Processing (NLP) field, the former using human coding to identify patterns associated with and causes of gender bias in text and the latter to detect, measure and mitigate gender bias in NLP output and models. This work aims to use NLP to facilitate automatic, quantitative analysis of educational text within the framework of a gender bias taxonomy. Analyses of both educational texts and a lexical resource (WordNet) reveal patterns of bias that can inform and aid educators in updating textbooks and lexical resources and in designing assessment items.

2020

pdf bib
NLPStatTest: A Toolkit for Comparing NLP System Performance
Haotian Zhu | Denise Mak | Jesse Gioannini | Fei Xia
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations

Statistical significance testing centered on p-values is commonly used to compare NLP system performance, but p-values alone are insufficient because statistical significance differs from practical significance. The latter can be measured by estimating effect size. In this pa-per, we propose a three-stage procedure for comparing NLP system performance and provide a toolkit, NLPStatTest, that automates the process. Users can upload NLP system evaluation scores and the toolkit will analyze these scores, run appropriate significance tests, estimate effect size, and conduct power analysis to estimate Type II error. The toolkit provides a convenient and systematic way to compare NLP system performance that goes beyond statistical significance testing.