2025
pdf
bib
abs
NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model
Yen-Ting Lin
|
Zhehuai Chen
|
Piotr Zelasko
|
Zhen Wan
|
Xuesong Yang
|
Zih-Ching Chen
|
Krishna C Puvvada
|
Ke Hu
|
Szu-Wei Fu
|
Jun Wei Chiu
|
Jagadeesh Balam
|
Boris Ginsburg
|
Yu-Chiang Frank Wang
|
Chao-Han Huck Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an “expert” of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset’s tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative 5.0% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-3.5-Sonnet with 15.5% to 27.6% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
pdf
bib
abs
Bringing Suzhou Numerals into the Digital Age: A Dataset and Recognition Study on Ancient Chinese Trade Records
Ting-Lin Wu
|
Zih-Ching Chen
|
Chen-Yuan Chen
|
Pi-Jhong Chen
|
Li-Chiao Wang
Proceedings of the Second Workshop on Ancient Language Processing
Suzhou numerals, a specialized numerical no-tation system historically used in Chinese com-merce and accounting, played a pivotal role in financial transactions from the Song Dynasty to the early 20th century. Despite their his-torical significance, they remain largely absent from modern OCR benchmarks, limiting com-putational access to archival trade documents. This paper presents a curated dataset of 773 expert-annotated Suzhou numeral samples ex-tracted from late Qing-era trade ledgers. We provide a statistical analysis of character distri-butions, offering insights into their real-world usage in historical bookkeeping. Additionally, we evaluate baseline performance with hand-written text recognition (HTR) model, high-lighting the challenges of recognizing low-resource brush-written numerals. By introduc-ing this dataset and initial benchmark results, we aim to facilitate research in historical doc-umentation in ancient Chinese characters, ad-vancing the digitization of early Chinese finan-cial records. The dataset is publicly available at our huggingface hub, and our codebase can be accessed at our github repository.
2022
pdf
bib
abs
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
Chin-Lun Fu
|
Zih-Ching Chen
|
Yun-Ru Lee
|
Hung-yi Lee
Findings of the Association for Computational Linguistics: NAACL 2022
Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer. Extensive experiments are conducted to demonstrate the effectiveness of AdapterBias. The experiments show that our proposed method can dramatically reduce the trainable parameters compared to the previous works with a minimal decrease in task performances compared with fine-tuned pre-trained models. We further find that AdapterBias automatically learns to assign more significant representation shifts to the tokens related to the task in consideration.