Angela Cao


2025

pdf bib
Cross-Document Event-Keyed Summarization
William Walden | Pavlo Kuchmiichuk | Alexander Martin | Chihsheng Jin | Angela Cao | Claire Sun | Curisia Allen | Aaron White
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)

Event-keyed summarization (EKS) requires summarizing a specific event described in a document given the document text and an event representation extracted from it. In this work, we extend EKS to the cross-document setting (CDEKS), in which summaries must synthesize information from accounts of the same event as given by multiple sources. We introduce **SEAMuS** (**S**ummaries of **E**vents **A**cross **Mu**ltiple **S**ources), a high-quality dataset for CDEKS based on an expert reannotation of the FAMuS dataset for cross-document argument extraction. We present a suite of baselines on SEAMuS—covering both smaller, fine-tuned models, as well as zero- and few-shot prompted LLMs—along with detailed ablations and a human evaluation study, showing SEAMuS to be a valuable benchmark for this new task.

2022

pdf bib
A Cognitive Approach to Annotating Causal Constructions in a Cross-Genre Corpus
Angela Cao | Gregor Williamson | Jinho D. Choi
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022

We present a scheme for annotating causal language in various genres of text. Our annotation scheme is built on the popular categories of cause, enable, and prevent. These vague categories have many edge cases in natural language, and as such can prove difficult for annotators to consistently identify in practice. We introduce a decision based annotation method for handling these edge cases. We demonstrate that, by utilizing this method, annotators are able to achieve inter-annotator agreement which is comparable to that of previous studies. Furthermore, our method performs equally well across genres, highlighting the robustness of our annotation scheme. Finally, we observe notable variation in usage and frequency of causal language across different genres.