Paisarn Charoenpornsawat
2026
HAT: Hallucination Annotation for Translation
Rajen Chatterjee | Xintong Li | Paisarn Charoenpornsawat | Allen Lee
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Rajen Chatterjee | Xintong Li | Paisarn Charoenpornsawat | Allen Lee
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hallucinations in machine translation (MT)—outputs that may be fluent yet unfaithful to the source content—remain a critical obstacle. They hinder the reliable deployment of MT systems in real-world applications. Despite growing attention to this phenomenon, progress has been constrained by the lack of large-scale, high-quality benchmarks dedicated to hallucination detection. We introduce HAT (Hallucination Annotation for Translation), a novel dataset designed to advance research on this problem. HAT comprises 350,959 span-level annotated samples across 38 language pairs, with approximately 8,000–10,000 samples per pair partitioned into training, development, and test sets. Annotations were produced by professional translators under rigorous quality control protocols to ensure reliability. We provide a detailed analysis of hallucination distributions and establish benchmark performance using a diverse set of baselines, including automatic MT evaluation metrics as well as large language models. By providing the first large-scale, systematically annotated resource for hallucination detection in MT, HAT enables the development of more faithful translation models and lays the groundwork for future research on building trustworthy machine translation systems.
2009
Incremental Adaptation of Speech-to-Speech Translation
Nguyen Bach | Roger Hsiao | Matthias Eck | Paisarn Charoenpornsawat | Stephan Vogel | Tanja Schultz | Ian Lane | Alex Waibel | Alan Black
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Nguyen Bach | Roger Hsiao | Matthias Eck | Paisarn Charoenpornsawat | Stephan Vogel | Tanja Schultz | Ian Lane | Alex Waibel | Alan Black
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
2007
The CMU TransTac 2007 eyes-free two-way speech-to-speech translation system
Nguyen Bach | Matthais Eck | Paisarn Charoenpornsawat | Thilo Köhler | Sebastian Stüker | ThuyLinh Nguyen | Roger Hsiao | Alex Waibel | Stephan Vogel | Tanja Schultz | Alan W. Black
Proceedings of the Fourth International Workshop on Spoken Language Translation
Nguyen Bach | Matthais Eck | Paisarn Charoenpornsawat | Thilo Köhler | Sebastian Stüker | ThuyLinh Nguyen | Roger Hsiao | Alex Waibel | Stephan Vogel | Tanja Schultz | Alan W. Black
Proceedings of the Fourth International Workshop on Spoken Language Translation
The paper describes our portable two-way speech-to-speech translation system using a completely eyes-free/hands-free user interface. This system translates between the language pair English and Iraqi Arabic as well as between English and Farsi, and was built within the framework of the DARPA TransTac program. The Farsi language support was developed within a 90-day period, testing our ability to rapidly support new languages. The paper gives an overview of the system’s components along with the individual component objective measures and a discussion of issues relevant for the overall usage of the system. We found that usability, flexibility, and robustness serve as severe constraints on system architecture and design.
2006
Thai Grapheme-Based Speech Recognition
Paisarn Charoenpornsawat | Sanjika Hewavitharana | Tanja Schultz
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Paisarn Charoenpornsawat | Sanjika Hewavitharana | Tanja Schultz
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
2003
A Context-Sensitive Homograph Disambiguation in Thai Text-to-Speech Synthesis
Virongrong Tesprasit | Paisarn Charoenpornsawat | Virach Sornlertlamvanich
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers
Virongrong Tesprasit | Paisarn Charoenpornsawat | Virach Sornlertlamvanich
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers