Ge Gao


2023

pdf
Continually Improving Extractive QA via Human Feedback
Ge Gao | Hung-Ting Chen | Yoav Artzi | Eunsol Choi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Our experiments show effective improvement from user feedback of extractive QA models over time across different data regimes, including significant potential for domain adaptation.

pdf
Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors
Nikita Mehandru | Sweta Agrawal | Yimin Xiao | Ge Gao | Elaine Khoong | Marine Carpuat | Niloufar Salehi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

A major challenge in the practical use of Machine Translation (MT) is that users lack information on translation quality to make informed decisions about how to rely on outputs. Progress in quality estimation research provides techniques to automatically assess MT quality, but these techniques have primarily been evaluated in vitro by comparison against human judgments outside of a specific context of use. This paper evaluates quality estimation feedback in vivo with a human study in realistic high-stakes medical settings. Using Emergency Department discharge instructions, we study how interventions based on quality estimation versus backtranslation assist physicians in deciding whether to show MT outputs to a patient. We find that quality estimation improves appropriate reliance on MT, but backtranslation helps physicians detect more clinically harmful errors that QE alone often misses.

pdf
TiKG-30K:基于表示学习的藏语知识图谱数据集(TiKG-30K: A Tibetan Knowledge Graph Dataset Based on Representation Learning)
Wenhao Zhuang (庄文浩) | Ge Gao (高歌) | Yuan Sun (孙媛)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“知识图谱的表示学习旨在通过将实体和关系映射到低维向量空间中来学习知识图谱数据之间的复杂语义关联,为信息检索、智能问答、知识推理等研究提供了支撑。目前知识图谱的表示学习研究主要集中在英、汉等语言,公开高质量数据集(如FB15k-237,WN18RR)对其研究起到非常重要的作用。但是,对于低资源语言(如藏语),由于缺少公开的知识图谱数据集,相关研究任务还处于起步阶段。基于此,本文提出一个公开的藏语知识图谱数据集TiKG-30K,包含了146679个三元组,30986个实体和641种关系,可应用于知识图谱的表示学习及下游任务。针对现有藏语知识图谱数据量少、数据稀疏的问题,本文利用藏文三元组中实体的同指关系,借助其他语言丰富的知识库和非文本介质对知识库进行扩充,通过跨语言近义词检索、合并同义实体和关系、修正错误三元组等技术对知识图谱进行多层优化,最终构建了藏语知识图谱数据集TiKG-30K。最后,本文采用多种经典表示学习模型在TiKG-30K进行了实验,并与英文数据集FB15k-237、WN18RR以及藏文数据集TD50K进行了对比,结果表明,TiKG-30K可以与FB15k-237、WN18RR数据集相媲美。本文将TiKG-30K数据集公开,http://tikg-30k.cmli-nlp.com

2022

pdf
Prediction of People’s Emotional Response towards Multi-modal News
Ge Gao | Sejin Paik | Carley Reardon | Yanling Zhao | Lei Guo | Prakash Ishwar | Margrit Betke | Derry Tanti Wijaya
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We aim to develop methods for understanding how multimedia news exposure can affect people’s emotional responses, and we especially focus on news content related to gun violence, a very important yet polarizing issue in the U.S. We created the dataset NEmo+ by significantly extending the U.S. gun violence news-to-emotions dataset, BU-NEmo, from 320 to 1,297 news headline and lead image pairings and collecting 38,910 annotations in a large crowdsourcing experiment. In curating the NEmo+ dataset, we developed methods to identify news items that will trigger similar versus divergent emotional responses. For news items that trigger similar emotional responses, we compiled them into the NEmo+-Consensus dataset. We benchmark models on this dataset that predict a person’s dominant emotional response toward the target news item (single-label prediction). On the full NEmo+ dataset, containing news items that would lead to both differing and similar emotional responses, we also benchmark models for the novel task of predicting the distribution of evoked emotional responses in humans when presented with multi-modal news content. Our single-label and multi-label prediction models outperform baselines by large margins across several metrics.

pdf
BU-NEmo: an Affective Dataset of Gun Violence News
Carley Reardon | Sejin Paik | Ge Gao | Meet Parekh | Yanling Zhao | Lei Guo | Margrit Betke | Derry Tanti Wijaya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Given our society’s increased exposure to multimedia formats on social media platforms, efforts to understand how digital content impacts people’s emotions are burgeoning. As such, we introduce a U.S. gun violence news dataset that contains news headline and image pairings from 840 news articles with 15K high-quality, crowdsourced annotations on emotional responses to the news pairings. We created three experimental conditions for the annotation process: two with a single modality (headline or image only), and one multimodal (headline and image together). In contrast to prior works on affectively-annotated data, our dataset includes annotations on the dominant emotion experienced with the content, the intensity of the selected emotion and an open-ended, written component. By collecting annotations on different modalities of the same news content pairings, we explore the relationship between image and text influence on human emotional response. We offer initial analysis on our dataset, showing the nuanced affective differences that appear due to modality and individual factors such as political leaning and media consumption habits. Our dataset is made publicly available to facilitate future research in affective computing.

pdf
Simulating Bandit Learning from User Feedback for Extractive Question Answering
Ge Gao | Eunsol Choi | Yoav Artzi
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on few examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation effort, but instead improving the system on-the-fly via user feedback.

2018

pdf
Neural Metaphor Detection in Context
Ge Gao | Eunsol Choi | Yejin Choi | Luke Zettlemoyer
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present end-to-end neural models for detecting metaphorical word use in context. We show that relatively standard BiLSTM models which operate on complete sentences work well in this setting, in comparison to previous work that used more restricted forms of linguistic context. These models establish a new state-of-the-art on existing verb metaphor detection benchmarks, and show strong performance on jointly predicting the metaphoricity of all words in a running text.