Abstract
This paper describes a Bayesian language model for predicting spontaneous utterances. People sometimes say unexpected words, such as fillers or hesitations, that cause the miss-prediction of words in normal N-gram models. Our proposed model considers mixtures of possible segmental contexts, that is, a kind of context-word selection. It can reduce negative effects caused by unexpected words because it represents conditional occurrence probabilities of a word as weighted mixtures of possible segmental contexts. The tuning of mixture weights is the key issue in this approach as the segment patterns becomes numerous, thus we resolve it by using Bayesian model. The generative process is achieved by combining the stick-breaking process and the process used in the variable order Pitman-Yor language model. Experimental evaluations revealed that our model outperformed contiguous N-gram models in terms of perplexity for noisy text including hesitations.- Anthology ID:
- C16-1016
- Volume:
- Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Yuji Matsumoto, Rashmi Prasad
- Venue:
- COLING
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 161–170
- Language:
- URL:
- https://aclanthology.org/C16-1016
- DOI:
- Cite (ACL):
- Ryu Takeda and Kazunori Komatani. 2016. Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 161–170, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words (Takeda & Komatani, COLING 2016)
- PDF:
- https://preview.aclanthology.org/landing_page/C16-1016.pdf