Maria Myung-Hee Kim

Also published as: Maria Myung Hee Kim


2025

pdf bib
Understanding Multilingual ASR Systems: The Role of Language Families and Typological Features in Seamless and Whisper
Simon Gonzalez | Tao Hoang | Maria Myung-Hee Kim | Bradley Donnelly | Jennifer Biggs | Tim Cawley
Proceedings of The 23rd Annual Workshop of the Australasian Language Technology Association

This study investigates the extent to which linguistic typology influences the performance of two automatic speech recognition (ASR) systems across diverse language families. Using the FLEURS corpus and typological features from the World Atlas of Language Structures (WALS), we analysed 40 languages grouped by phonological, morphological, syntactic, and semantic domains. We evaluated two state-of-the-art multilingual ASR systems, Whisper and Seamless, to examine how their performance, measured by word error rate (WER), correlates with linguistic structures. Random Forests and Mixed Effects Models were used to quantify feature impact and statistical significance. Results reveal that while both systems leverage typological patterns, they differ in their sensitivity to specific domains. Our findings highlight how structural and functional linguistic features shape ASR performance, offering insights into model generalisability and typology-aware system development.

2021

pdf bib
Robustness Analysis of Grover for Machine-Generated News Detection
Rinaldo Gagiano | Maria Myung-Hee Kim | Xiuzhen Zhang | Jennifer Biggs
Proceedings of the 19th Annual Workshop of the Australasian Language Technology Association

Advancements in Natural Language Generation have raised concerns on its potential misuse for deep fake news. Grover is a model for both generation and detection of neural fake news. While its performance on automatically discriminating neural fake news surpassed GPT-2 and BERT, Grover could face a variety of adversarial attacks to deceive detection. In this work, we present an investigation of Grover’s susceptibility to adversarial attacks such as character-level and word-level perturbations. The experiment results show that even a singular character alteration can cause Grover to fail, affecting up to 97% of target articles with unlimited attack attempts, exposing a lack of robustness. We further analyse these misclassified cases to highlight affected words, identify vulnerability within Grover’s encoder, and perform a novel visualisation of cumulative classification scores to assist in interpreting model behaviour.

2017

pdf bib
Incremental Knowledge Acquisition Approach for Information Extraction on both Semi-Structured and Unstructured Text from the Open Domain Web
Maria Myung Hee Kim
Proceedings of the Australasian Language Technology Association Workshop 2017