How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task
Rahul Aralikatte, Héctor Ricardo Murrieta Bello, Miryam de Lhoneux, Daniel Hershcovich, Marcel Bollmann, Anders Søgaard
Abstract
This work shows that competitive translation results can be obtained in a constrained setting by incorporating the latest advances in memory and compute optimization. We train and evaluate large multilingual translation models using a single GPU for a maximum of 100 hours and get within 4-5 BLEU points of the top submission on the leaderboard. We also benchmark standard baselines on the PMI corpus and re-discover well-known shortcomings of translation systems and metrics.- Anthology ID:
- 2021.wat-1.24
- Volume:
- Proceedings of the 8th Workshop on Asian Translation (WAT2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
- Venue:
- WAT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 205–211
- Language:
- URL:
- https://aclanthology.org/2021.wat-1.24
- DOI:
- 10.18653/v1/2021.wat-1.24
- Cite (ACL):
- Rahul Aralikatte, Héctor Ricardo Murrieta Bello, Miryam de Lhoneux, Daniel Hershcovich, Marcel Bollmann, and Anders Søgaard. 2021. How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 205–211, Online. Association for Computational Linguistics.
- Cite (Informal):
- How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task (Aralikatte et al., WAT 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.wat-1.24.pdf
- Data
- PMIndia, mC4