Kuan-Yu Chen

Also published as: Kuan-yu Chen

2025

Large language model (LLM)-driven multi-agent systems (MAS) are transforming how humans and AIs collaboratively generate ideas and artifacts. While existing surveys provide comprehensive overviews of MAS infrastructures, they largely overlook the dimension of creativity, including how novel outputs are generated and evaluated, how creativity informs agent personas, and how creative workflows are coordinated. This is the first survey dedicated to creativity in MAS. We focus on text and image generation tasks, and present:(1) a taxonomy of agent proactivity and persona design;(2) an overview of generation techniques, including divergent exploration, iterative refinement, and collaborative synthesis, as well as relevant datasets and evaluation metrics; and(3) a discussion of key challenges, such as inconsistent evaluation standards, insufficient bias mitigation, coordination conflicts, and the lack of unified benchmarks.This survey offers a structured framework and roadmap for advancing the development, evaluation, and standardization of creative MAS.

pdf bib abs

Training a Chinese Listenability Model Using Word2Vec to Predict the Difficulty of Spoken Texts
Yen-Hsiang Chien | Hou-Chiang Tseng | Kuan-Yu Chen | Yao-Ting Sung
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

With the proliferation of digital learning, an increasing number of learners are engaging with audio-visual materials. For preschool and lower elementary students, whose literacy skills are still limited, knowledge acquisition relies more heavily on spoken and visual content. Traditional readability models were primarily developed for written texts, and their applicability to spoken materials remains uncertain. To address this issue, this study investigates the impact of different word segmentation tools and language models on the performance of automatic grade classification models for Chinese spoken materials. Support Vector Machines were employed for grade prediction, aiming to automatically determine the appropriate grade level of learning resources and assist learners in selecting suitable materials. The results show that language models with higher-dimensional word embeddings achieved better classification performance, with an accuracy of up to 61% and an adjacent accuracy of 76%. These findings may contribute to future digital learning platforms or educational resource recommendation systems by automatically providing students with appropriate listening materials to enhance learning outcomes.

pdf bib abs

Toward Traditional Chinese ModernBERT: A Preliminary Study
Yi-En Chen | Qiao-Ying He | Kuan-Yu Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

This study employs several state-of-the-art techniques, including RoPE and Flash Attention, and leverages large-scale Chinese web corpora and encyclopedic data to pre-train an encoder model specifically designed for long text in Traditional Chinese. We evaluate the model on tasks such as reading comprehension and text classification, and the results show that its overall performance lags behind existing Chinese benchmarks. Through pseudo-perplexity analysis, we infer that the pre-training phase did not sufficiently capture the data distribution, potentially due to factors such as hyperparameters, convergence, and data quality. Although the results are suboptimal, this study still offers valuable experimental insights and directions for improving Chinese language model development.

pdf bib abs

Cross-user Collaborative and Sequential Modeling for Recommendation
Qiao-Ying He | Yi-En Chen | Kuan-Yu Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Multi-behavior recommendation leverages auxiliary behaviors to effectively alleviate the sparsity of target behaviors. Existing approaches can be broadly categorized into two paradigms: sequential models that capture individual temporal dynamics but often omit cross-user information, and graph-based models that mine collaborative patterns yet lack temporal dependency modeling. To address these limitations, this paper proposes an integrated approach that combines sequential and graph modeling: the former focuses on learning temporal dependencies within user behavior sequences, while the latter captures cross-user behavior paths. By fusing the predictions from both components, the method achieves more accurate recommendations. Experiments on two e-commerce datasets, Taobao and RetailRocket, show that the integrated model outperforms the strong baseline MB-STR by about 1% in both HR@10 and NDCG@10. These results indicate that incorporating cross-user collaborative information consistently improves performance, even on top of strong sequential models.

pdf bib abs

Hakka Speech Recognition with Whisper and Pinyin Post-processing for FSR-2025
Chia-Hsin Lee | Yung-Jun Chang | Jin-Yan Wu | Kuan-Yu Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

本研究為參加 FSR-2025 客語語音辨識挑戰賽(Hakka ASR II)的技術報告,旨在推進客語自動語音辨識技術的發展。由於客語屬於低資源語言,且存在多種腔調,語音辨識面臨高度挑戰。我們以 Whisperlarge-v2 為骨幹模型,設計兩階段訓練流程:首先利用「Hakka Across Taiwan(HAT)」語料庫進行模型調適,以捕捉客語的一般聲學特徵;其次在賽事方提供的60 小時腔調語料上進行微調,以增強對目標資料的適應性。實驗發現,直接輸出客語漢字可達到良好的字錯率(CER),但由於腔調差異與拼音規則變化多,拼音任務表現顯著下降。為解決此問題,我們以漢字模型的編碼器初始化拼音模型,並提出結合 RoBERTa 漢字轉拼音、腔調判斷與字典修正的後處理模組,期望可以在比賽中提升辨識的成效。

Kuan-Yu Chen

2025

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2011

2009

Co-authors

Venues