Lingfeng Wu


2025

pdf bib
A Novel Chinese-Idiom Automatic Error Correction Method Based on the Hidden Markov Model
Rongbin Zhang | Anlu Gui | Peng Cao | Lingfeng Wu | Feng Huang | Jiahui Li
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Spelling errors in Chinese idioms frequently occur due to various types of misspellings and optical character recognition (OCR) errors in daily learning and usage. Achieving automatic error correction for Chinese idioms is one of the important natural language processing tasks, as it helps improve the quality of Chinese texts as well as language learning. Existing methods, such as edit distance and custom dictionary approaches, suffer from limited error correction capability, low computational efficiency, and weak flexibility. To address these limitations, this paper proposes a novel automatic error correction method for Chinese idioms based on the hidden Markov model (HMM). Specifically, the generation process of idiom spelling errors is modeled using an HMM, transforming the idiom correction problem into a matching task between erroneous idioms and legitimate idioms. By constructing a legitimate idiom table and a Chinese character confusion set, a prototype system for idiom correction was developed, and performance testing was completed. Experiment results demonstrate that the proposed model is simpler with fewer parameters and has lower computational complexity while exhibiting stronger error correction capability and parameter robustness as compared to existing methods. It can more flexibly correct diverse types of idiom errors, showing high potential application value.

2018

pdf bib
Session-level Language Modeling for Conversational Speech
Wayne Xiong | Lingfeng Wu | Jun Zhang | Andreas Stolcke
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence. The model consists of a long-short-term memory (LSTM) recurrent network that reads the entire word-level history of a conversation, as well as information about turn taking and speaker overlap, in order to predict each next word. The model is applied in a rescoring framework, where the word history prior to the current utterance is approximated with preliminary recognition results. In experiments in the conversational telephone speech domain (Switchboard) we find that such a model gives substantial perplexity reductions over a standard LSTM-LM with utterance scope, as well as improvements in word error rate.