2024
pdf
abs
The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse
Xiaobo Guo
|
Neil Potnis
|
Melody Yu
|
Nabeel Gillani
|
Soroush Vosoughi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The ability for individuals to constructively engage with one another across lines of difference is a critical feature of a healthy pluralistic society. This is also true in online discussion spaces like social media platforms. To date, much social media research has focused on preventing ills—like political polarization and the spread of misinformation. While this is important, enhancing the quality of online public discourse requires not just reducing ills, but also, promoting foundational human virtues. In this study, we focus on one particular virtue: “intellectual humility” (IH), or acknowledging the potential limitations in one’s own beliefs. Specifically, we explore the development of computational methods for measuring IH at scale. We manually curate and validate an IH codebook on 350 posts about religion drawn from subreddits and use them to develop LLM-based models for automating this measurement. Our best model achieves a Macro-F1 score of 0.64 across labels (and 0.70 when predicting IH/IA/Neutral at the coarse level), higher than an expected naive baseline of 0.51 (0.32 for IH/IA/Neutral) but lower than a human annotator-informed upper bound of 0.85 (0.83 for IH/IA/Neutral). Our results both highlight the challenging nature of detecting IH online—opening the door to new directions in NLP research—and also lay a foundation for computational social science researchers interested in analyzing and fostering more IH in online public discourse.
pdf
abs
MODABS: Multi-Objective Learning for Dynamic Aspect-Based Summarization
Xiaobo Guo
|
Soroush Vosoughi
Findings of the Association for Computational Linguistics: ACL 2024
The rapid proliferation of online content necessitates effective summarization methods, among which dynamic aspect-based summarization stands out. Unlike its traditional counterpart, which assumes a fixed set of known aspects, this approach adapts to the varied aspects of the input text. We introduce a novel multi-objective learning framework employing a Longformer-Encoder-Decoder for this task. The framework optimizes aspect number prediction, minimizes disparity between generated and reference summaries for each aspect, and maximizes dissimilarity across aspect-specific summaries. Extensive experiments show our method significantly outperforms baselines on three diverse datasets, largely due to the effective alignment of generated and reference aspect counts without sacrificing single-aspect summarization quality.
pdf
abs
Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts
Xiaobo Guo
|
Soroush Vosoughi
Findings of the Association for Computational Linguistics: EMNLP 2024
Aspect-based summarization has seen significant advancements, especially in structured text. Yet, summarizing disordered, large-scale texts, like those found in social media and customer feedback, remains a significant challenge. Current research largely targets predefined aspects within structured texts, neglecting the complexities of dynamic and disordered environments. Addressing this gap, we introduce Disordered-DABS, a novel benchmark for dynamic aspect-based summarization tailored to unstructured text. Developed by adapting existing datasets for cost-efficiency and scalability, our comprehensive experiments and detailed human evaluations reveal that Disordered-DABS poses unique challenges to contemporary summarization models, including state-of-the-art language models such as GPT-3.5.
2023
pdf
abs
Length Does Matter: Summary Length can Bias Summarization Metrics
Xiaobo Guo
|
Soroush Vosoughi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Establishing the characteristics of an effective summary is a complicated and often subjective endeavor. Consequently, the development of metrics for the summarization task has become a dynamic area of research within natural language processing. In this paper, we reveal that existing summarization metrics exhibit a bias toward the length of generated summaries. Our thorough experiments, conducted on a variety of datasets, metrics, and models, substantiate these findings. The results indicate that most metrics tend to favor longer summaries, even after accounting for other factors. To address this issue, we introduce a Bayesian normalization technique that effectively diminishes this bias. We demonstrate that our approach significantly improves the concordance between human annotators and the majority of metrics in terms of summary coherence.
2022
pdf
abs
Capturing Topic Framing via Masked Language Modeling
Xiaobo Guo
|
Weicheng Ma
|
Soroush Vosoughi
Findings of the Association for Computational Linguistics: EMNLP 2022
Differential framing of issues can lead to divergent world views on important issues. This is especially true in domains where the information presented can reach a large audience, such as traditional and social media. Scalable and reliable measurement of such differential framing is an important first step in addressing them. In this work, based on the intuition that framing affects the tone and word choices in written language, we propose a framework for modeling the differential framing of issues through masked token prediction via large-scale fine-tuned language models (LMs). Specifically, we explore three key factors for our framework: 1) prompt generation methods for the masked token prediction; 2) methods for normalizing the output of fine-tuned LMs; 3) robustness to the choice of pre-trained LMs used for fine-tuning. Through experiments on a dataset of articles from traditional media outlets covering five diverse and politically polarized topics, we show that our framework can capture differential framing of these topics with high reliability.
pdf
abs
RotateCT: Knowledge Graph Embedding by Rotation and Coordinate Transformation in Complex Space
Yao Dong
|
Lei Wang
|
Ji Xiang
|
Xiaobo Guo
|
Yuqiang Xie
Proceedings of the 29th International Conference on Computational Linguistics
Knowledge graph embedding, which aims to learn representations of entities and relations in knowledge graphs, finds applications in various downstream tasks. The key to success of knowledge graph embedding models are the ability to model relation patterns including symmetry/antisymmetry, inversion, commutative composition and non-commutative composition. Although existing methods fail in modeling the non-commutative composition patterns, several approaches support this pattern by modeling beyond Euclidean space and complex space. Nevertheless, expanding to complicated spaces such as quaternion can easily lead to a substantial increase in the amount of parameters, which greatly reduces the computational efficiency. In this paper, we propose a new knowledge graph embedding method called RotateCT, which first transforms the coordinates of each entity, and then represents each relation as a rotation from head entity to tail entity in complex space. By design, RotateCT can infer the non-commutative composition patterns and improve the computational efficiency. Experiments on multiple datasets empirically show that RotateCT outperforms most state-of-the-art methods on link prediction and path query answering.