Belati Jagad Bintang Syuhada


2025

pdf bib
Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations
Patrick Amadeus Irawan | Ryandito Diandaru | Belati Jagad Bintang Syuhada | Randy Zakya Suchrady | Alham Fikri Aji | Genta Indra Winata | Fajri Koto | Samuel Cahyawijaya
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)

We introduce Entropy2Vec, a novel framework for deriving cross-lingual language representations by leveraging the entropy of monolingual language models. Unlike traditional typological inventories that suffer from feature sparsity and static snapshots, Entropy2Vec uses the inherent uncertainty in language models to capture typological relationships between languages. By training a language model on a single language, we hypothesize that the entropy of its predictions reflects its structural similarity to other languages: Low entropy indicates high similarity, while high entropy suggests greater divergence. This approach yields dense, non-sparse language embeddings that are adaptable to different timeframes and free from missing values. Empirical evaluations demonstrate that Entropy2Vec embeddings align with established typological categories and achieved competitive performance in downstream multilingual NLP tasks, such as those addressed by the LinguAlchemy framework.