Kazuki Yano
2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
|
Takumi Ito
|
Jun Suzuki
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Pre-training large language models (LLMs) faces significant memory challenges due to the large size of model weights. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.
2024
Document-level Translation with LLM Reranking: Team-J at WMT 2024 General Translation Task
Keito Kudo
|
Hiroyuki Deguchi
|
Makoto Morishita
|
Ryo Fujii
|
Takumi Ito
|
Shintaro Ozaki
|
Koki Natsumi
|
Kai Sato
|
Kazuki Yano
|
Ryosuke Takahashi
|
Subaru Kimura
|
Tomomasa Hara
|
Yusuke Sakai
|
Jun Suzuki
Proceedings of the Ninth Conference on Machine Translation
We participated in the constrained track for English-Japanese and Japanese-Chinese translations at the WMT 2024 General Machine Translation Task. Our approach was to generate a large number of sentence-level translation candidates and select the most probable translation using minimum Bayes risk (MBR) decoding and document-level large language model (LLM) re-ranking. We first generated hundreds of translation candidates from multiple translation models and retained the top 30 candidates using MBR decoding. In addition, we continually pre-trained LLMs on the target language corpora to leverage document-level information. We utilized LLMs to select the most probable sentence sequentially in context from the beginning of the document.
Search
Fix data
Co-authors
- Takumi Ito 2
- Jun Suzuki 2
- Hiroyuki Deguchi 1
- Ryo Fujii 1
- Tomomasa Hara 1
- show all...