Zijie Liu
2026
Dialogue is Better Than Monologue: Instructing Meidcal LLMs via Strategic Conversations
Zijie Liu | Xinyu Zhao | Jie Peng | Jinhao Duan | Zhuangdi Zhu | Qingyu Chen | Kaidi Xu | Xia Hu | Tianlong Chen
Findings of the Association for Computational Linguistics: EACL 2026
Zijie Liu | Xinyu Zhao | Jie Peng | Jinhao Duan | Zhuangdi Zhu | Qingyu Chen | Kaidi Xu | Xia Hu | Tianlong Chen
Findings of the Association for Computational Linguistics: EACL 2026
In real clinical practice, clinicians must sift through noisy and often conflicting information, progressively gathering and sequencing evidence before reaching conclusions. However, existing tuning methods for medical AI models are typically monologue-based — that is, models are fine-tuned on static question answering (QA) tasks or medical articles, which fail to reflect the interactive and iterative nature of clinical reasoning. To bridge this gap, we introduce MuddyMaze, a benchmark designed to expose the limitations of current monologue-based tuning, and construct a large dialogue dataset of 22.2k doctor–patient interactions that capture stepwise diagnostic reasoning validated by medical experts. Building on those, we propose dialogue-tuning, a new fine-tuning paradigm that captures the internal reasoning dynamics unfolding across interactions.To assess the effectiveness of our approach, we evaluated dialogue-tuned models on MuddyMaze, where they outperform monologue-tuned baselines (e.g., MedQA) by +16.1% in one-round and +4.1% in multi-round evidence ranking, while maintaining or even improving accuracy on standard medical QA benchmarks (e.g., PubMedQA). These results indicate that dialogue-tuning not only enhances reasoning robustness and evidence integration but also preserves the factual precision of traditional QA performance.
2025
FIER: Fine-Grained and Efficient KV Cache Retrieval for Long-context LLM Inference
Dongwei Wang | Zijie Liu | Song Wang | Yuxin Ren | Jianing Deng | Jingtong Hu | Tianlong Chen | Huanrui Yang
Findings of the Association for Computational Linguistics: EMNLP 2025
Dongwei Wang | Zijie Liu | Song Wang | Yuxin Ren | Jianing Deng | Jingtong Hu | Tianlong Chen | Huanrui Yang
Findings of the Association for Computational Linguistics: EMNLP 2025
The Key-Value (KV) cache reading latency increases significantly with context lengths, hindering the efficiency of long-context LLM inference. To address this, previous works propose retaining a small fraction of KV cache based on token importance. For example, KV eviction uses static heuristics to retain tokens, while KV retrieval dynamically selects query-relevant tokens for more adaptive cache management. However, we observe that important tokens are often sparsely distributed across the long context. This sparsity makes existing page-level KV retrieval inaccurate, as each page may include irrelevant tokens and miss critical ones. In this work, we propose Fier, a **Fi**ne-Grained and **E**fficient KV cache **R**etrieval method. Fier uses 1-bit quantized keys to estimate the importance of each token, resulting in efficient and precise retrieval. Experiments show that Fier matches full KV performance using only 11% of the cache budget across various long-context tasks, reducing decoding latency by 1.2× to 1.5×.