Lost in Quantization: Activation Outliers Explain Language-Specific FP8 Sensitivity in Llama-3

Guilherme Silva, Pedro Silva, Matheus Peixoto, Gladston Moreira, Eduardo Luz


Abstract
Quantization is key for efficient LLM inference, but its language-specific effects are understudied. We compare INT8 and FP8 (E4M3) quantization for Meta-Llama-3-8B on English and Brazilian Portuguese (PT-BR). INT8 with outlier handling preserves perplexity in both languages, while naive FP8 casting degrades English far more than PT-BR (+18% vs. +3.9%). Activation analysis shows rarer, larger English spikes (>35) that are more prone to saturation under unscaled E4M3, whereas PT-BR activations are more concentrated. Our FP8 results reflect a naive casting stress test (no calibration/scaling), not an optimized FP8 recipe.
Anthology ID:
2026.propor-1.108
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1044–1048
Language:
URL:
https://preview.aclanthology.org/ingest-dnd/2026.propor-1.108/
DOI:
Bibkey:
Cite (ACL):
Guilherme Silva, Pedro Silva, Matheus Peixoto, Gladston Moreira, and Eduardo Luz. 2026. Lost in Quantization: Activation Outliers Explain Language-Specific FP8 Sensitivity in Llama-3. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 1044–1048, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
Lost in Quantization: Activation Outliers Explain Language-Specific FP8 Sensitivity in Llama-3 (Silva et al., PROPOR 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-dnd/2026.propor-1.108.pdf