Tsz Fung Yau
2026
SiniticMTError: A Machine Translation Dataset with Error Annotations for Sinitic Languages
Hannah Liu | Junghyun Min | Annie En-Shiun Lee | Ethan Yue Heng Cheung | Shou-Yi Hung | Elsie Chan | Shiyao Qian | Runtong Liang | Kimlan Huynh | Wing Yu Yip | York Hay Ng | Tsz Fung Yau | Ka Ieng Charlotte Lo | You-Wei Wu | Richard Tzong-Han Tsai
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Hannah Liu | Junghyun Min | Annie En-Shiun Lee | Ethan Yue Heng Cheung | Shou-Yi Hung | Elsie Chan | Shiyao Qian | Runtong Liang | Kimlan Huynh | Wing Yu Yip | York Hay Ng | Tsz Fung Yau | Ka Ieng Charlotte Lo | You-Wei Wu | Richard Tzong-Han Tsai
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Despite major advances in machine translation (MT) in recent years, progress remains limited for many low-resource languages that lack large-scale training data and linguistic resources. In this paper, we introduce SINITICMTERROR, a novel fine-grained dataset that builds on existing parallel corpora to provide error span, error type, and error severity annotations in machine-translated examples from English to Mandarin, Cantonese, and Wu Chinese, along with a Mandarin-Hokkien component derived from a non-parallel source. Our dataset serves as a resource for the MT community to fine-tune models with error detection capabilities, supporting research on translation quality estimation, error-aware generation, and low-resource language evaluation. We also establish baseline results using language models to benchmark translation error detection performance. Specifically, we evaluate multiple open source and closed source LLMs using span-level and correlation-based MQM metrics, revealing their limited precision, underscoring the need for our dataset. Finally, we report our rigorous annotation process by native speakers, with analyses on pilot studies, iterative feedback, insights, and patterns in error type and severity.
2024
Can Machine Unlearning Reduce Social Bias in Language Models?
Omkar Dige | Diljot Arneja | Tsz Fung Yau | Qixuan Zhang | Mohammad Bolandraftar | Xiaodan Zhu | Faiza Khan Khattak
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Omkar Dige | Diljot Arneja | Tsz Fung Yau | Qixuan Zhang | Mohammad Bolandraftar | Xiaodan Zhu | Faiza Khan Khattak
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Mitigating bias in language models (LMs) has become a critical problem due to the widespread deployment of LMs in the industry and customer-facing applications. Numerous approaches revolve around data pre-processing and subsequent fine-tuning of language models, tasks that can be both time-consuming and computationally demanding. As alternatives, machine unlearning techniques are being explored, yet there is a notable lack of comparative studies evaluating the effectiveness of these methods. In this work, we explore the effectiveness of two machine unlearning methods: Partitioned Contrastive Gradient Unlearning (PCGU) applied on decoder models, and Negation via Task Vector, and compare them with Direct Preference Optimization (DPO) to reduce social biases in open-source LMs such as LLaMA-2 and OPT. We also implement distributed PCGU for large models. It is empirically shown, through quantitative and qualitative analyses, that negation via Task Vector method outperforms PCGU and is comparable to DPO in debiasing models with minimum deterioration in model performance and perplexity. Negation via Task Vector reduces the bias score by 25.5% for LLaMA-2 and achieves bias reduction of up to 40% for OPT models. Moreover, it can be easily tuned to balance the trade-off between bias reduction and generation quality, unlike DPO.