Ryo Kishino
2026
Establishing a Scale for Kullback-Leibler Divergence in Language Models Across Various Settings
Ryo Kishino | Yusuke Takase | Momose Oyama | Hiroaki Yamagiwa | Hidetoshi Shimodaira
Findings of the Association for Computational Linguistics: ACL 2026
Ryo Kishino | Yusuke Takase | Momose Oyama | Hiroaki Yamagiwa | Hidetoshi Shimodaira
Findings of the Association for Computational Linguistics: ACL 2026
Log-likelihood vectors define a common space for comparing language models as probability distributions, enabling unified comparisons across heterogeneous settings. We extend this framework to training checkpoints and intermediate layers, and establish a consistent scale for KL divergence across pretraining, model size, random seeds, quantization, fine-tuning, and layers. Analysis of Pythia pretraining trajectories further shows that changes in log-likelihood space, as measured by the scaling behavior of KL divergence, are much smaller than in weight space, resulting in subdiffusive learning trajectories and early stabilization of language-model behavior despite weight drift.
2025
Likelihood Variance as Text Importance for Resampling Texts to Map Language Models
Momose Oyama | Ryo Kishino | Hiroaki Yamagiwa | Hidetoshi Shimodaira
Findings of the Association for Computational Linguistics: EMNLP 2025
Momose Oyama | Ryo Kishino | Hiroaki Yamagiwa | Hidetoshi Shimodaira
Findings of the Association for Computational Linguistics: EMNLP 2025
We address the computational cost of constructing a model map, which embeds diverse language models into a common space for comparison via KL divergence. The map relies on log-likelihoods over a large text set, making the cost proportional to the number of texts. To reduce this cost, we propose a resampling method that selects important texts with weights proportional to the variance of log-likelihoods across models for each text. Our method significantly reduces the number of required texts while preserving the accuracy of KL divergence estimates. Experiments show that it achieves comparable performance to uniform sampling with about half as many texts, and also facilitates efficient incorporation of new models into an existing map. These results enable scalable and efficient construction of language model maps.
Quantifying Lexical Semantic Shift via Unbalanced Optimal Transport
Ryo Kishino | Hiroaki Yamagiwa | Ryo Nagata | Sho Yokoi | Hidetoshi Shimodaira
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ryo Kishino | Hiroaki Yamagiwa | Ryo Nagata | Sho Yokoi | Hidetoshi Shimodaira
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lexical semantic change detection aims to identify shifts in word meanings over time. While existing methods using embeddings from a diachronic corpus pair estimate the degree of change for target words, they offer limited insight into changes at the level of individual usage instances. To address this, we apply Unbalanced Optimal Transport (UOT) to sets of contextualized word embeddings, capturing semantic change through the excess and deficit in the alignment between usage instances. In particular, we propose Sense Usage Shift (SUS), a measure that quantifies changes in the usage frequency of a word sense at each usage instance. By leveraging SUS, we demonstrate that several challenges in semantic change detection can be addressed in a unified manner, including quantifying instance-level semantic change and word-level tasks such as measuring the magnitude of semantic change and the broadening or narrowing of meaning.