ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

Wissam Antoun, Benoît Sagot, Djamé Seddah


Abstract
Pretrained transformer-encoder models like DeBERTaV3 and ModernBERT introduce architectural advancements aimed at improving efficiency and performance. Although the authors of ModernBERT report improved performance over DeBERTaV3 on several benchmarks, the lack of disclosed training data and the absence of comparisons using a shared dataset make it difficult to determine whether these gains are due to architectural improvements or differences in training data. In this work, we conduct a controlled study by pretraining ModernBERT on the same dataset as CamemBERTaV2, a DeBERTaV3 French model, isolating the effect of model design. Our results show that the previous model generation remains superior in sample efficiency and overall benchmark performance, with ModernBERT’s primary advantage being its support for long context, faster training, and inference speed. However, the new proposed model still provides meaningful architectural improvements compared to earlier models such as BERT and RoBERTa. Additionally, we observe that high-quality pre-training data accelerates convergence but does not significantly improve final performance, suggesting potential benchmark saturation. These findings show the importance of disentangling pretraining data from architectural innovations when evaluating transformer models.
Anthology ID:
2025.ijcnlp-long.164
Volume:
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venues:
IJCNLP | AACL
SIG:
Publisher:
The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:
3061–3074
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.164/
DOI:
Bibkey:
Cite (ACL):
Wissam Antoun, Benoît Sagot, and Djamé Seddah. 2025. ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 3061–3074, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance (Antoun et al., IJCNLP-AACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.164.pdf