Munir Georges

2026

Entropy-aware Masking for Masked Language Modeling
Gokul Srinivasagan | Kai Hartung | Munir Georges
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)

Masked language modeling has become a standard pretraining objective for training encoder-based language models. In this approach, certain tokens in the input are masked, and the model learns to predict them using the surrounding context. This process enables the model to capture both syntactic and semantic properties of language. Conventionally, the tokens selected for masking are chosen at random, which may not always yield the most effective learning signals. In this work, we examine a token masking strategy based on entropy distribution. We use the model’s entropy over token predictions to identify which tokens should be masked. This method aims to target tokens that are more informative and uncertain to improve the training efficacy. We also propose a novel self-masking approach that enhances training efficiency without relying on an external reference model. Experimental results demonstrate that our method achieves an average performance improvement of 5% in GLUE scores compared to the baseline. Further, we experiment with combining knowledge distillation with entropy masking, resulting in the best overall results.

2023

pdf bib

Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)
Munir Georges | Aaricia Herygers | Annemarie Friedrich | Benjamin Roth
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)

2022

pdf bib abs

Typological Word Order Correlations with Logistic Brownian Motion
Kai Hartung | Gerhard Jäger | Sören Gröttrup | Munir Georges
Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

In this study we address the question to what extent syntactic word-order traits of different languages have evolved under correlation and whether such dependencies can be found universally across all languages or restricted to specific language families. To do so, we use logistic Brownian Motion under a Bayesian framework to model the trait evolution for 768 languages from 34 language families. We test for trait correlations both in single families and universally over all families. Separate models reveal no universal correlation patterns and Bayes Factor analysis of models over all covered families also strongly indicate lineage specific correlation patters instead of universal dependencies.

pdf bib

Hierarchical Multi-Task Transformers for Crosslingual Low Resource Phoneme Recognition
Kevin Glocker | Munir Georges
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

Munir Georges

2026

2023

2022

2021

Co-authors

Venues