Peter Zeng


2025

pdf bib
Synthetic Audio Helps for Cognitive State Tasks
Adil Soubki | John Murzaku | Peter Zeng | Owen Rambow
Findings of the Association for Computational Linguistics: NAACL 2025

The NLP community has broadly focused on text-only approaches of cognitive state tasks, but audio can provide vital missing cues through prosody. We posit that text-to-speech models learn to track aspects of cognitive state in order to produce naturalistic audio, and that the signal audio models implicitly identify is orthogonal to the information that language models exploit. We present Synthetic Audio Data fine-tuning (SAD), a framework where we show that 7 tasks related to cognitive state modeling benefit from multimodal training on both text and zero-shot synthetic audio data from an off-the-shelf TTS system. We show an improvement over the text-only modality when adding synthetic audio data to text-only corpora. Furthermore, on tasks and corpora that do contain gold audio, we show our SAD framework achieves competitive performance with text and synthetic audio compared to text and gold audio.

2024

pdf bib
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground
Adil Soubki | John Murzaku | Arash Yousefi Jordehi | Peter Zeng | Magdalena Markowska | Seyed Abolghasem Mirroshandel | Owen Rambow
Findings of the Association for Computational Linguistics: ACL 2024

Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM.

2022

pdf bib
Re-Examining FactBank: Predicting the Author’s Presentation of Factuality
John Murzaku | Peter Zeng | Magdalena Markowska | Owen Rambow
Proceedings of the 29th International Conference on Computational Linguistics

We present a corrected version of a subset of the FactBank data set. Previously published results on FactBank are no longer valid. We perform experiments on FactBank using multiple training paradigms, data smoothing techniques, and polarity classifiers. We argue that f-measure is an important alternative evaluation metric for factuality. We provide new state-of-the-art results for four corpora including FactBank. We perform an error analysis on Factbank combined with two similar corpora.