User-side Model Consistency Monitoring for Open Source Large Language Models Inference Services

Qijun Miao, Zhixuan Fang


Abstract
With the continuous advancement in the performance of open-source large language models (LLMs), their inference services have attracted a substantial user base by offering quality comparable to closed-source models at a significantly lower cost. However, it has also given rise to trust issues regarding model consistency between users and third-party service providers. Specifically, service providers can effortlessly degrade a model’s parameter scale or precision for more margin profits, and although users may perceptibly experience differences in text quality, they often lack a reliable method for concrete monitoring. To address this problem, we propose a paradigm for model consistency monitoring on the user side. It constructs metrics based on the logits produced by LLMs to differentiate sequences generated by degraded models. Furthermore, by leveraging model offloading techniques, we demonstrate that the proposed method is implementable on consumer-grade devices. Metric evaluations conducted on three widely used LLMs series (OPT, Llama 3.1 and Qwen 2.5) along with system prototype efficiency tests on a consumer device (RTX 3080 TI) confirm both the effectiveness and feasibility of the proposed approach.
Anthology ID:
2025.acl-long.569
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11610–11622
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.569/
DOI:
Bibkey:
Cite (ACL):
Qijun Miao and Zhixuan Fang. 2025. User-side Model Consistency Monitoring for Open Source Large Language Models Inference Services. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11610–11622, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
User-side Model Consistency Monitoring for Open Source Large Language Models Inference Services (Miao & Fang, ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.569.pdf