Jelmer Van der Linde

Also published as: Jelmer van der Linde

2021

pdf bib abs
TranslateLocally: Blazing-fast translation running on the local CPU
Nikolay Bogoychev | Jelmer Van der Linde | Kenneth Heafield
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Every day, millions of people sacrifice their privacy and browsing habits in exchange for online machine translation. Companies and governments with confidentiality requirements often ban online translation or pay a premium to disable logging. To bring control back to the end user and demonstrate speed, we developed translateLocally. Running locally on a desktop or laptop CPU, translateLocally delivers cloud-like translation speed and quality even on 10 year old hardware. The open-source software is based on Marian and runs on Linux, Windows, and macOS.

We participated in all tracks of the WMT 2021 efficient machine translation task: single-core CPU, multi-core CPU, and GPU hardware with throughput and latency conditions. Our submissions combine several efficiency strategies: knowledge distillation, a simpler simple recurrent unit (SSRU) decoder with one or two layers, lexical shortlists, smaller numerical formats, and pruning. For the CPU track, we used quantized 8-bit models. For the GPU track, we experimented with FP16 and 8-bit integers in tensorcores. Some of our submissions optimize for size via 4-bit log quantization and omitting a lexical shortlist. We have extended pruning to more parts of the network, emphasizing component- and block-level pruning that actually improves speed unlike coefficient-wise pruning.

Co-authors

Qianqian Zhu 1

Svetlana Tchistiakova 1

Pinzhen Chen 1

Sidharth Kashyap 1

Roman Grundkiewicz 1

Venues

EMNLP1
WMT1