Andreas Stolcke
Also published as: A. Stolcke
2025
Unifying Streaming and Non-streaming Zipformer-based ASR
Bidisha Sharma | Karthik Pandia D S | Shankar Venkatesan | Jeena J Prakash | Shashi Kumar | Malolan Chetlur | Andreas Stolcke
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Bidisha Sharma | Karthik Pandia D S | Shankar Venkatesan | Jeena J Prakash | Shashi Kumar | Malolan Chetlur | Andreas Stolcke
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
There has been increasing interest in unifying streaming and non-streaming automatic speech recognition (ASR) models to reduce development, training, and deployment costs. We present a unified framework that trains a single end-to-end ASR model for both streaming and non-streaming applications, leveraging future context information. We propose to use dynamic right-context through the chunked attention masking in the training of zipformer-based ASR models. We demonstrate that using right-context is more effective in zipformer models compared to other conformer models due to its multi-scale nature. We analyze the effect of varying the number of right-context frames on accuracy and latency of the streaming ASR models. We use Librispeech and large in-house conversational datasets to train different versions of streaming and non-streaming models and evaluate them in a production grade server-client setup across diverse testsets of different domains. The proposed strategy reduces word error by relative 7.9% with a small degradation in user-perceived latency. By adding more right-context frames, we are able to achieve streaming performance close to that of non-streaming models. Our approach also allows flexible control of the latency-accuracy tradeoff according to customers requirements.
Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
Aaron Zheng | Mansi Rana | Andreas Stolcke
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Aaron Zheng | Mansi Rana | Andreas Stolcke
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
With the recent proliferation of large language models (LLMs), enterprises have been able to rapidly develop proof-of-concepts and prototypes. As a result, there is a growing need to implement robust guardrails that monitor, quantize and control an LLM’s behavior, ensuring that the use is reliable, safe, accurate and also aligned with the users’ expectations. Previous approaches for filtering out inappropriate user prompts or system outputs, such as LlamaGuard and OpenAI’s MOD API, have achieved significant success by fine-tuning existing LLMs. However, using fine-tuned LLMs as guardrails introduces increased latency and higher maintenance costs, which may not be practical or scalable for cost-efficient deployments. We take a different approach, focusing on fine-tuning a lightweight architecture: Sentence-BERT. This method reduces the model size from LlamaGuard’s 7 billion parameters to approximately 67 million, while maintaining comparable performance on the AEGIS safety benchmark.
SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling
Kadri Hacioglu | Manjunath K E | Andreas Stolcke
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Kadri Hacioglu | Manjunath K E | Andreas Stolcke
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Slot filling is a crucial subtask in spoken language understanding (SLU), traditionally implemented as a cascade of speech recognition followed by one or more natural language understanding (NLU) components. The recent advent of speech-based large language models (speechLLMs), which integrate speech and textual foundation models, has opened new avenues for achieving speech understanding tasks in a more unified, generative, and instruction-following manner while promising data and compute efficiency with zero-shot abilities, generalizing to unseen slot labels. We address the slot-filling task by creating an empirical upper bound for the task, identifying performance, robustness, and generalization gaps, and proposing improvements to the training data, architecture, and training strategies to narrow the gap with the upper bound result. We show that each of these measures improve performance substantially, while highlighting practical challenges and providing empirical guidance and insights for harnessing these emerging models.
Spoken Conversational Agents with Large Language Models
Huck Yang | Andreas Stolcke | Larry P. Heck
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Huck Yang | Andreas Stolcke | Larry P. Heck
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Spoken conversational agents are converging toward voice-native LLMs. This tutorial distills the path from cascaded ASR/NLU to end-to-end, retrieval-and vision-grounded systems. We frame adaptation of text LLMs to audio, cross-modal alignment, and joint speech–text training; review datasets, metrics, and robustness across accents; and compare design choices (cascaded vs. E2E, post-ASR correction, streaming). We link industrial assistants to current open-domain and task-oriented agents, highlight reproducible baselines, and outline open problems in privacy, safety, and evaluation. Attendees leave with practical recipes and a clear systems-level roadmap.
2024
Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output
Hithesh Sankararaman | Mohammed Nasheed Yasin | Tanner Sorensen | Alessandro Di Bari | Andreas Stolcke
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Hithesh Sankararaman | Mohammed Nasheed Yasin | Tanner Sorensen | Alessandro Di Bari | Andreas Stolcke
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
We present a light-weight approach for detecting nonfactual outputs from retrieval-augemented generation (RAG). Given a context and putative output, we compute a factuality score that can be thresholded to yield a binary decision to check the results of LLM-based question-answering, summarization, or other systems. Unlike factuality checkers that themselves rely on LLMs, we use compact, open-source natural language inference (NLI) models that yield a freely accessible solution with low latency and low cost at run-time, and no need for LLM fine-tuning. The approach also enables downstream mitigation and correction of hallucinations, by tracing them back to specific context chunks. Our experiments show high ROC-AUC across a wide range of relevant open source datasets, indicating the effectiveness of our method for fact-checking RAG output.
2022
CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals
Scott Novotney | Sreeparna Mukherjee | Zeeshan Ahmed | Andreas Stolcke
Findings of the Association for Computational Linguistics: ACL 2022
Scott Novotney | Sreeparna Mukherjee | Zeeshan Ahmed | Andreas Stolcke
Findings of the Association for Computational Linguistics: ACL 2022
We propose a framework to modularize the training of neural language models that use diverse forms of context by eliminating the need to jointly train context and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one type of contextual data and adapts to novel context types. The model consists of a pretrained neural sentence LM, a BERT-based contextual encoder, and a masked transfomer decoder that estimates LM probabilities using sentence-internal and contextual evidence. When contextually annotated data is unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real context data can be introduced later and used to adapt a small number of parameters that map contextual data into the decoder’s embedding space. We validate the CUE framework on a NYTimes text corpus with multiple metadata types, for which the LM perplexity can be lowered from 36.6 to 27.4 by conditioning on context. Bootstrapping a contextual LM with only a subset of the metadata during training retains 85% of the achievable gain. Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context. Furthermore, we can swap one type of pretrained sentence LM for another without retraining the context encoders, by only adapting the decoder model. Overall, we obtain a modular framework that allows incremental, scalable training of context-enhanced LMs.
2021
Attention-based Contextual Language Model Adaptation for Speech Recognition
Richard Diehl Martinez | Scott Novotney | Ivan Bulyko | Ariya Rastrow | Andreas Stolcke | Ankur Gandhe
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Richard Diehl Martinez | Scott Novotney | Ivan Bulyko | Ariya Rastrow | Andreas Stolcke | Ankur Gandhe
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2018
Session-level Language Modeling for Conversational Speech
Wayne Xiong | Lingfeng Wu | Jun Zhang | Andreas Stolcke
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Wayne Xiong | Lingfeng Wu | Jun Zhang | Andreas Stolcke
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence. The model consists of a long-short-term memory (LSTM) recurrent network that reads the entire word-level history of a conversation, as well as information about turn taking and speaker overlap, in order to predict each next word. The model is applied in a rescoring framework, where the word history prior to the current utterance is approximated with preliminary recognition results. In experiments in the conversational telephone speech domain (Switchboard) we find that such a model gives substantial perplexity reductions over a standard LSTM-LM with utterance scope, as well as improvements in word error rate.
2013
Using Out-of-Domain Data for Lexical Addressee Detection in Human-Human-Computer Dialog
Heeyoung Lee | Andreas Stolcke | Elizabeth Shriberg
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Heeyoung Lee | Andreas Stolcke | Elizabeth Shriberg
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A Cross-language Study on Automatic Speech Disfluency Detection
Wen Wang | Andreas Stolcke | Jiahong Yuan | Mark Liberman
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Wen Wang | Andreas Stolcke | Jiahong Yuan | Mark Liberman
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2007
Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages
Mathias Creutz | Teemu Hirsimäki | Mikko Kurimo | Antti Puurula | Janne Pylkkönen | Vesa Siivola | Matti Varjokallio | Ebru Arisoy | Murat Saraçlar | Andreas Stolcke
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference
Mathias Creutz | Teemu Hirsimäki | Mikko Kurimo | Antti Puurula | Janne Pylkkönen | Vesa Siivola | Matti Varjokallio | Ebru Arisoy | Murat Saraçlar | Andreas Stolcke
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference
2005
Using Conditional Random Fields for Sentence Boundary Detection in Speech
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)
2004
Improving Automatic Sentence Boundary Detection with Confusion Networks
D. Hillard | M. Ostendorf | A. Stolcke | Y. Liu | E. Shriberg
Proceedings of HLT-NAACL 2004: Short Papers
D. Hillard | M. Ostendorf | A. Stolcke | Y. Liu | E. Shriberg
Proceedings of HLT-NAACL 2004: Short Papers
Comparing and Combining Generative and Posterior Probability Models: Some Advances in Sentence Boundary Detection in Speech
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
Yang Liu | Andreas Stolcke | Elizabeth Shriberg | Mary Harper
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
2003
Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures
Ivan Bulyko | Mari Ostendorf | Andreas Stolcke
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers
Ivan Bulyko | Mari Ostendorf | Andreas Stolcke
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers
2001
The Meeting Project at ICSI
Nelson Morgan | Don Baron | Jane Edwards | Dan Ellis | David Gelbart | Adam Janin | Thilo Pfau | Elizabeth Shriberg | Andreas Stolcke
Proceedings of the First International Conference on Human Language Technology Research
Nelson Morgan | Don Baron | Jane Edwards | Dan Ellis | David Gelbart | Adam Janin | Thilo Pfau | Elizabeth Shriberg | Andreas Stolcke
Proceedings of the First International Conference on Human Language Technology Research
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
G. Tur | D. Hakkani-Tur | A. Stolcke | E. Shriberg
Computational Linguistics, Volume 27, Number 1, March 2001
G. Tur | D. Hakkani-Tur | A. Stolcke | E. Shriberg
Computational Linguistics, Volume 27, Number 1, March 2001
2000
Dialogue act modeling for automatic tagging and recognition of conversational speech
Andreas Stolcke | Klaus Ries | Noah Coccaro | Elizabeth Shriberg | Rebecca Bates | Daniel Jurafsky | Paul Taylor | Rachel Martin | Carol Van Ess-Dykema | Marie Meteer
Computational Linguistics, Volume 26, Number 3, September 2000
Andreas Stolcke | Klaus Ries | Noah Coccaro | Elizabeth Shriberg | Rebecca Bates | Daniel Jurafsky | Paul Taylor | Rachel Martin | Carol Van Ess-Dykema | Marie Meteer
Computational Linguistics, Volume 26, Number 3, September 2000
1995
Partitioning Grammars and Composing Parsers
Fuliang Weng | Andreas Stolcke
Proceedings of the Fourth International Workshop on Parsing Technologies
Fuliang Weng | Andreas Stolcke
Proceedings of the Fourth International Workshop on Parsing Technologies
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
Andreas Stolcke
Computational Linguistics, Volume 21, Number 2, June 1995
Andreas Stolcke
Computational Linguistics, Volume 21, Number 2, June 1995
1994
Precise N-Gram Probabilities From Stochastic Context-Free Grammars
Andreas Stolcke | Jonathan Segal
32nd Annual Meeting of the Association for Computational Linguistics
Andreas Stolcke | Jonathan Segal
32nd Annual Meeting of the Association for Computational Linguistics
1990
Search
Fix author
Co-authors
- Elizabeth Shriberg 7
- Yang Liu (刘扬) 3
- Ivan Bulyko 2
- Mary Harper 2
- Scott Novotney 2
- Mari Ostendorf 2
- Zeeshan Ahmed 1
- Ebru Arisoy 1
- Alessandro Di Bari 1
- Don Baron 1
- Rebecca Bates 1
- Malolan Chetlur 1
- Noah Coccaro 1
- Mathias Creutz 1
- Richard Diehl Martinez 1
- Manjunath K E 1
- Jane Edwards 1
- Dan Ellis 1
- Ankur Gandhe 1
- David Gelbart 1
- Kadri Hacioglu 1
- Dilek Hakkani-Tur 1
- Larry P. Heck 1
- Dustin Hillard 1
- Teemu Hirsimäki 1
- Adam Janin 1
- Dan Jurafsky 1
- Shashi Kumar 1
- Mikko Kurimo 1
- Heeyoung Lee 1
- Mark Liberman 1
- Rachel Martin 1
- Marie Meteer 1
- Nelson Morgan 1
- Sreeparna Mukherjee 1
- Thilo Pfau 1
- Jeena J Prakash 1
- Antti Puurula 1
- Janne Pylkkönen 1
- Mansi Rana 1
- Ariya Rastrow 1
- Klaus Ries 1
- Karthik Pandia D S 1
- Hithesh Sankararaman 1
- Murat Saraclar 1
- Jonathan Segal 1
- Bidisha Sharma 1
- Vesa Siivola 1
- Tanner Sorensen 1
- Paul Taylor 1
- Gokhan Tur 1
- Carol Van Ess-Dykema 1
- Matti Varjokallio 1
- Shankar Venkatesan 1
- Wen Wang (王雯) 1
- Fuliang Weng 1
- Lingfeng Wu 1
- Wayne Xiong 1
- Huck Yang 1
- Mohammed Nasheed Yasin 1
- Jiahong Yuan 1
- Jun Zhang 1
- Aaron Zheng 1