Hayden Helm


2025

pdf bib
Statistical inference on black-box generative models in the data kernel perspective space
Hayden Helm | Aranyak Acharyya | Youngser Park | Brandon Duderstadt | Carey Priebe
Findings of the Association for Computational Linguistics: ACL 2025

Generative models are capable of producing human-expert level content across a variety of topics and domains. As the impact of generative models grows, it is necessary to develop statistical methods to understand collections of available models. These methods are particularly important in settings where the user may not have access to information related to a model’s pre-training data, weights, or other relevant model-level covariates. In this paper we extend recent results on representations of black-box generative models to model-level statistical inference tasks. We demonstrate that the model-level representations are effective for multiple inference tasks.

2024

pdf bib
Tracking the perspectives of interacting language models
Hayden Helm | Brandon Duderstadt | Youngser Park | Carey Priebe
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of a communication network of LLMs and introduce a method for representing the perspective of individual models within a collection of LLMs. Given these tools we systematically study information diffusion in the communication network of LLMs in various simulated settings.