Junyoung Jang


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
DaCoM: Strategies to Construct Domain-specific Low-resource Language Machine Translation Dataset
Junghoon Kang | Keunjoo Tak | Joungsu Choi | Myunghyun Kim | Junyoung Jang | Youjin Kang
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

Translation of low-resource languages in industrial domains is essential for improving market productivity and ensuring foreign workers have better access to information. However, existing translators struggle with domain-specific terms, and there is a lack of expert annotators for dataset creation. In this work, we propose DaCoM, a methodology for collecting low-resource language pairs from industrial domains to address these challenges. DaCoM is a hybrid translation framework enabling effective data collection. The framework consists of a large language model and neural machine translation. Evaluation verifies existing models perform inadequately on DaCoM-created datasets, with up to 53.7 BLEURT points difference depending on domain inclusion. DaCoM is expected to address the lack of datasets for domain-specific low-resource languages by being easily pluggable into future state-of-the-art models and maintaining an industrial domain-agnostic approach.