Pitchaya Chairuengjitjaras


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
On Creating an English-Thai Code-switched Machine Translation in Medical Domain
Parinthapat Pengpun | Krittamate Tiankanon | Amrest Chinkamol | Jiramet Kinchagawat | Pitchaya Chairuengjitjaras | Pasit Supholkhan | Pubordee Aussavavirojekul | Chiraphat Boonnag | Kanyakorn Veerakanjana | Hirunkul Phimsiri | Boonthicha Sae-jia | Nattawach Sataudom | Piyalitt Ittichaiwong | Peerat Limkonchotiwat
Findings of the Association for Computational Linguistics: EMNLP 2024

Machine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.