Ho-Lam Chung


2026

Neural audio codecs are widely used as tokenizers for spoken language models, but they are optimized for waveform reconstruction rather than autoregressive prediction.This mismatch injects acoustically driven uncertainty into the discrete token space and increases language-model perplexity.We propose , which augments codec training with language-model-facing objectives while keeping both codec and LLM architectures unchanged.introduces (i) future token prediction with Medusa-style multi-step heads to encourage multi-step predictability, and (ii) semantic alignment that matches audio and text representations via a memory-bank contrastive loss.A differentiable Gumbel bridge enables end-to-end gradients from these objectives to the codec encoder.On SALMon speech coherence, token LMs trained on reach 61.6% accuracy (+12.1 points over AUV) while reducing perplexity 35×.On Codec-SUPERB-tiny, improves speech Mel distance by 5.0% over AUV while simultaneously achieving the learnability gains, demonstrating that reconstruction fidelity and token predictability can be improved together.

2024

The sound codec’s dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.Recent years have witnessed significant developments in codec models.The ideal sound codec should preserve content, paralinguistics, speakers, and audio information.However, the question of which codec achieves optimal sound information preservation remains unanswered, as in different papers, models are evaluated on their selected experimental settings.This study introduces Codec-SUPERB, an acronym for Codec sound processing Universal PERformance Benchmark.It is an ecosystem designed to assess codec models across representative sound applications and signal-level metrics rooted in sound domain knowledge.Codec-SUPERB simplifies result sharing through an online leaderboard, promoting collaboration within a community-driven benchmark database, thereby stimulating new development cycles for codecs.Furthermore, we undertake an in-depth analysis to offer insights into codec models from both application and signal perspectives, diverging from previous codec papers mainly concentrating on signal-level comparisons.Finally, we will release codes, the leaderboard, and data to accelerate progress within the community.

2022

2020

In this paper, we investigate the following two limitations for the existing distractor generation (DG) methods. First, the quality of the existing DG methods are still far from practical use. There are still room for DG quality improvement. Second, the existing DG designs are mainly for single distractor generation. However, for practical MCQ preparation, multiple distractors are desired. Aiming at these goals, in this paper, we present a new distractor generation scheme with multi-tasking and negative answer training strategies for effectively generating multiple distractors. The experimental results show that (1) our model advances the state-of-the-art result from 28.65 to 39.81 (BLEU 1 score) and (2) the generated multiple distractors are diverse and shows strong distracting power for multiple choice question.