Zheng Weihua


2025

pdf bib
CCL-XCoT: An Efficient Cross-Lingual Knowledge Transfer Method for Mitigating Hallucination Generation
Zheng Weihua | Roy Ka-Wei Lee | Zhengyuan Liu | Wu Kui | AiTi Aw | Bowei Zou
Findings of the Association for Computational Linguistics: EMNLP 2025

Multilingual Large Language Models (MLLMs) demonstrate strong generalization across languages, yet they remain prone to hallucinations, especially in low-resource languages, due to training data imbalances. These hallucinations, which include inaccurate or fabricated outputs, are particularly problematic in domain-specific generation tasks (Chataigner et al., 2024). To address this challenge, we propose CCL-XCoT (Curriculum-based Contrastive Learning-based Cross-lingual Chain-of-Thought), a two-stage fine-tuning framework for mitigating hallucination in MLLMs. Our approach first enhances cross-lingual semantic alignment through curriculum-based contrastive learning combined with next-token prediction during continued pre-training. Building on this foundation, we then introduce a cross-lingual Chain-of-Thought (XCoT) prompting strategy during instruction fine-tuning, which guides the model to reason in a high-resource language before generating answers in the target low-resource language. Experimental results show that CCL-XCoT reduces hallucination rates by up to 62% and substantially improves factual knowledge transfer across language pairs, without relying on external retrieval or multi-model ensembles.

2022

pdf bib
SG Translate Together - Uplifting Singapore’s translation standards with the community through technology
Lee Siew Li | Adeline Sim | Gowri Kanagarajah | Siti Amirah | Foo Yong Xiang | Gayathri Ayathorai | Sarina Mohamed Rasol | Aw Ai Ti | Wu Kui | Zheng Weihua | Ding Yang | Tarun Kumar Vangani | Nabilah Binte Md Johan
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)

The Singapore’s Ministry of Communications and Information (MCI) has officially launched the SG Translate Together (SGTT) web portal on 27 June 2022, with the aim of partnering its citizens to improve translation standards in Singapore. This web portal houses the Singapore Government’s first neural machine translation (MT) engine, known as SG Translate, which was jointly developed by MCI and the Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR). Adapted using localised translation data, SG Translate is able to generate translations that are attuned to Singapore’s context and supports Singapore’s four (4) official languages – English (Singapore), Chinese (Singapore), Bahasa Melayu (Singapore) and Tamil (Singapore). Upon completion of development, MCI allowed all Government agencies to use SG Translate for their daily operations. This presentation will briefly cover the methodologies adopted and showcase SG Translate’s capability to translate content involving local culture, everyday life and government policies and schemes. This presentation will also showcase MCI’s sustainable approach for the continual training of the SG Translate MT engine through citizenry participation.