Xiaoxue Gao
2025
SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning
Zhengyuan Liu
|
Geyu Lin
|
Hui Li Tan
|
Huayun Zhang
|
Yanfeng Lu
|
Xiaoxue Gao
|
Stella Xin Yin
|
Sun He
|
Hock Huan Goh
|
Lung Hsiang Wong
|
Nancy F. Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
The integration of generative artificial intelligence into educational applications has enhanced personalized and interactive learning experiences, and it shows strong potential to promote young learners language acquisition. However, it is still challenging to ensure consistent and robust performance across different languages and cultural contexts, and kids-friendly design requires simplified instructions, engaging interactions, and age-appropriate scaffolding to maintain motivation and optimize learning outcomes.In this work, we introduce SingaKids, a dialogic tutor designed to facilitate language learning through picture description tasks. Our system integrates dense image captioning, multilingual dialogic interaction, speech understanding, and engaging speech generation to create an immersive learning environment in four languages: English, Mandarin, Malay, and Tamil. We further improve the system through multilingual pre-training, task-specific tuning, and scaffolding optimization. Empirical studies with elementary school students demonstrate that SingaKids provides effective dialogic teaching, benefiting learners at different performance levels.
2024
Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models
Yiming Chen
|
Xianghu Yue
|
Xiaoxue Gao
|
Chen Zhang
|
Luis Fernando D’Haro
|
Robby T. Tan
|
Haizhou Li
Findings of the Association for Computational Linguistics: EMNLP 2024
Various audio-LLMs (ALLMs) have been explored recently for tackling different audio tasks simultaneously using a single, unified model. While existing evaluations of ALLMs primarily focus on single-audio tasks, real-world applications often involve processing multiple audio streams simultaneously. To bridge this gap, we propose the first multi-audio evaluation (MAE) benchmark that consists of 20 datasets from 11 multi-audio tasks encompassing both speech and sound scenarios. Comprehensive experiments on MAE demonstrate that the existing ALLMs, while being powerful in comprehending primary audio elements in individual audio inputs, struggling to handle multi-audio scenarios. To this end, we propose a novel multi-audio-LLM (MALLM) to capture audio context among multiple similar audios using discriminative learning on our proposed synthetic data. The results demonstrate that the proposed MALLM outperforms all baselines and achieves high data efficiency using synthetic data without requiring human annotations. The proposed MALLM opens the door for ALLMs towards multi-audio processing era and brings us closer to replicating human auditory capabilities in machines.
Search
Fix author
Co-authors
- Yiming Chen 1
- Nancy Chen 1
- Luis Fernando D’Haro 1
- Hock Huan Goh 1
- Sun He 1
- show all...