Dongyang Hu


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness
Linqing Chen | Weilei Wang | Dongyang Hu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”