Jeff Ma

2026

SAUCE: Summary Analysis Using Conversation Entailment
Man-Ling Sung | Hemanth Kandula | Jeff Ma | William Hartmann | Matthew Snover
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)

With the growing need for evaluating Large Language Models (LLMs) and their applications to speech, challenges persist in summarizing and evaluating conversations that lack a clear end goal. We introduce SAUCE – a reference-free, fact-based evaluation pipeline for cross-lingual conversational speech summarization. It measures the accuracy and the fact coverage of a summary through the entailment between conversation and text. We compare SAUCE against several popular summarization metrics and demonstrate the effectiveness of capturing information loss due to transcription and translation error and identifying broken summaries. Crucially, unlike black-box LLM evaluators or dense embedding metrics, SAUCE is inherently explainable: it maps summary scores to discrete, verifiable facts, allowing users to pinpoint exact hallucinations or omissions. We illustrate how this interpretability helps developers systematically profile LLM behaviors and gives end-users an actionable tool to verify summary accuracy in noisy, real-world conditions. Preliminary investigations show SAUCE strongly align with human judgment.

2011

pdf bib

Building a Statistical Machine Translation System for Translating Patent Documents
Jeff Ma | Spyros Matsoukas
Proceedings of the 4th Workshop on Patent Translation

pdf bib

Improving Low-Resource Statistical Machine Translation with a Novel Semantic Word Clustering Algorithm
Jeff Ma | Spyros Matsoukas | Richard Schwartz
Proceedings of Machine Translation Summit XIII: Papers

Co-authors

Man-Ling Sung 1

Venues

Fix author