Qijun Miao

2025

pdf bib abs
User-side Model Consistency Monitoring for Open Source Large Language Models Inference Services
Qijun Miao | Zhixuan Fang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the continuous advancement in the performance of open-source large language models (LLMs), their inference services have attracted a substantial user base by offering quality comparable to closed-source models at a significantly lower cost. However, it has also given rise to trust issues regarding model consistency between users and third-party service providers. Specifically, service providers can effortlessly degrade a model’s parameter scale or precision for more margin profits, and although users may perceptibly experience differences in text quality, they often lack a reliable method for concrete monitoring. To address this problem, we propose a paradigm for model consistency monitoring on the user side. It constructs metrics based on the logits produced by LLMs to differentiate sequences generated by degraded models. Furthermore, by leveraging model offloading techniques, we demonstrate that the proposed method is implementable on consumer-grade devices. Metric evaluations conducted on three widely used LLMs series (OPT, Llama 3.1 and Qwen 2.5) along with system prototype efficiency tests on a consumer device (RTX 3080 TI) confirm both the effectiveness and feasibility of the proposed approach.

Co-authors

Zhixuan Fang 1

Venues

acl1

Fix data

Qijun Miao

Fixing paper assignments

2025

Co-authors

Venues