Yuntao Du

2026

Sequential diagnosis requires balancing diagnostic accuracy against resource costs through iterative information gathering. Existing Large Language Model (LLM) approaches exhibit a critical knowledge-reasoning gap: despite encoding extensive medical knowledge, they struggle to reason systematically under cost constraints, often resorting to excessive testing. We propose GraphDx, a knowledge-enhanced framework with two core innovations. First, we design an automated pipeline that leverages LLMs to construct Medical Diagnosis Knowledge Graphs (MDKGs) with quantized typicality, action-centric topology, and dual-objective attributes for both diagnostic relevance and cost-sensitivity. Second, we introduce three collaborative agents (Perception, Reasoning, and Decision) where the Perception and Decision Agents handle language understanding and generation, while the Reasoning Agent performs deterministic evidence scoring and cost-aware planning on the MDKG. Experiments on MedQA and MIMIC-IV across three LLM backbones (DeepSeek-V3, Kimi-k2, Llama-3.3) show that GraphDx improves diagnostic success rates from 50–68% to 79–93% while reducing test costs by 20–54%, providing a robust, economical, and interpretable solution for automated clinical diagnosis.

pdf bib abs

Large Multimodal Models (LMMs) encode rich factual knowledge via cross-modal pre-training, yet their static representations struggle to maintain an accurate understanding of time-sensitive knowledge. Existing benchmarks remain constrained by static designs, inadequately evaluating LMMs’ ability to understand time-sensitive knowledge. To address this gap, we propose MINED, a comprehensive benchmark containing 2,104 time-sensitive knowledge samples spanning six knowledge types, which evaluates temporal awareness along 6 key dimensions and 11 challenging tasks: cognition, awareness, trustworthiness, understanding, reasoning, and robustness. Evaluating 15 widely used LMMs on MINED shows that Gemini-2.5-Pro achieves the highest average CEM score of 63.07, while most open-source LMMs still lack time understanding ability. Meanwhile, LMMs perform best on organization knowledge, whereas their performance is weakest on sport. To address these challenges, we investigate the feasibility of updating time-sensitive knowledge in LMMs through knowledge editing methods and observe that LMMs can effectively update knowledge via knowledge editing methods in single editing scenarios.

pdf bib abs

Impressive progress has been made in automated problem-solving by the collaboration of large language model (LLM) based agents. However, these automated capabilities also open avenues for malicious applications. In this paper, we study a new threat that LLMs pose to online pseudonymity, called automated profile inference, where an adversary can instruct LLMs to automatically collect and extract sensitive personal attributes from publicly available user activities on pseudonymous platforms. We also introduce an automated profiling framework called AutoProfiler to demonstrate and assess the feasibility of such attacks in real-world scenarios. AutoProfiler consists of four specialized LLM agents that work collaboratively to retrieve and process user online activities and generate a profile with extracted personal information. Experimental results on two real-world datasets and one synthetic dataset show that AutoProfiler is highly effective and efficient, and the inferred attributes are both identifiable and sensitive, posing significant privacy risks. We explore mitigation strategies from different perspectives and advocate for increased public awareness of this emerging privacy threat.

2025

pdf bib abs

Learning SQL Like a Human: Structure-Aware Curriculum Learning for Text-to-SQL Generation
Xiaohu Zhu | Qian Li | Lizhen Cui | Yuntao Du
Findings of the Association for Computational Linguistics: EMNLP 2025

The Text-to-SQL capabilities of large language allow users to interact with databases using natural language. While current models struggle with handling complex queries, especially involving multi-table joins and reasoning. To address this gap, we propose to construct a model, namely SAC-SQL, with synthetic training samples followed by a structure-aware curriculum learning framework for enhancing SQL generation. Our approach begins with a supervised fine-tuning (SFT) stage, where we train open-source models on a synthetically constructed, cross-domain SQL dataset with diverse structural patterns. Moreover, we introduce a unified structure difficulty scoring function to partition the training samples into non-overlapping curriculum phases, guiding the model progressively learning from simpler to more complex SQL structures. Extensive experiments are conducted and the results show that SAC-SQL achieves better results than the baselines, and significantly narrows the performance gap between open-source and close-source models on Spider and Bird benchmarks.