Ju Ren
2026
CoTrust: Privacy-Preserving Collaboration Between Large and Small Language Models in Trusted Execution Environments
Zhenya Ma | Tingyi Wang | Yongheng Deng | Ziqing Qiao | Yinggui Wang | Tao Wei | Lei Wang | Ju Ren
Findings of the Association for Computational Linguistics: ACL 2026
Zhenya Ma | Tingyi Wang | Yongheng Deng | Ziqing Qiao | Yinggui Wang | Tao Wei | Lei Wang | Ju Ren
Findings of the Association for Computational Linguistics: ACL 2026
Services powered by large language models (LLMs) provide powerful text generation capabilities, but accessing sensitive user inputs raises serious privacy concerns. Trusted Execution Environments (TEEs) provide a secure computation environment, enabling sensitive inputs to be safely processed. However, directly deploying high-capacity LLMs in TEEs is often prohibitively expensive due to computation and memory constraints. To reconcile privacy, efficiency, and generation quality, we propose CoTrust, a privacy-preserving collaborative inference framework that combines LLMs with small language models (SLMs) inside TEE. CoTrust uses multiple de-identified views to let the LLM produce a consensus scaffold capturing answer reasoning without exposing private information, which the SLM then grounds in the full input to generate the final response. Experiments on multiple question answering and summarization benchmarks show that CoTrust approaches the performance of unconstrained LLMs, outperforms existing privacy-preserving baselines, and maintains strong privacy protection, while remaining efficient in a TDX-based TEE implementation.
2025
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
Ziqing Qiao | Yongheng Deng | Jiali Zeng | Dong Wang | Lai Wei | Guanbo Wang | Fandong Meng | Jie Zhou | Ju Ren | Yaoxue Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Ziqing Qiao | Yongheng Deng | Jiali Zeng | Dong Wang | Lai Wei | Guanbo Wang | Fandong Meng | Jie Zhou | Ju Ren | Yaoxue Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs, increasing computational overhead. Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection, which fails to remove redundant content thoroughly. To address these limitations, this work begins by framing two key patterns of redundant reflection in LRMs—Confidence Deficit, wherein the model reflects on correct intermediate steps, and Termination Delay, where reflection continues after a verified, confident answer—through a confidence-guided perspective. Based on this, we introduce ConCISE (Confidence-guided Compression In Step-by-step Efficient Reasoning), a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient. Extensive experiments demonstrate that compared to baseline methods, fine-tuning LRMs on ConCISE-generated data yields a better balance between compression and task performance, reducing length by up to ~50% under SimPO, while maintaining high task accuracy.