Shilei Tan


2026

Large Multimodal Models (LMMs) have demonstrated significant potential in the medical domain, achieving impressive performance on tasks ranging from report generation to visual question answering. However, existing benchmarks predominantly focus on static evaluation, assessing models on isolated data points. This approach neglects a critical aspect of clinical practice: longitudinal analysis, where physicians interpret patient data as a dynamic trajectory to track disease progression and treatment response. To address this gap, we introduce ELTLM, the first benchmark specifically tailored to assess the temporal perception and reasoning capabilities of medical LMMs. Constructed from temporal chest X-rays, ELTLM features a hierarchical task taxonomy comprising Temporal Perception QA and Temporal Reasoning QA, requiring models to detect fine-grained visual changes and infer high-level clinical trends. Our evaluation of state-of-the-art models reveals that while they excel in static scenarios, they struggle significantly with temporal grounding and consistency. ELTLM serves as a vital resource to identify these limitations and guide the development of future time-aware medical AI systems. Our data is available at [ELTLM](https://github.com/ChengFeng233/ELTLM-Bench).
Detecting machine-revised text that exhibits subtle lexical differences from the original human-generated text remains a challenge. Recent detection methods, including watermarking-based, logit-based, and training-based models, struggle to capture the fine-grained semantic differences, especially for short texts. To address this issue, we propose Length-aware Momentum Contrastive Learning (LAMCL), a novel framework for multiscale machine-revised text detection that integrates two core modules. To enhance the discriminative semantic features, the Enhance Before Detection (EBD) module first fuses the original detected text with the counterpart processed by a Large Language Model (LLM), and then measures semantic consistency to distinguish between machine-revised and human-generated text. Meanwhile, based on the Momentum Contrastive Learning (MCL) framework, the Length-aware Weighting (LW) module leverages text length and label information for hard negative sampling, mitigating the ambiguity of short text attribution and boosting the robustness of representation learning. Experimental results demonstrate that our method outperforms the existing detectors in identifying multiscale machine-revised text across diverse practical scenarios, tasks, and LLMs. The code is available at https://github.com/hangtze/LAMCL.