Dongjie Wang

2026

Research in cross-lingual modeling for historical and extremely low-resource languages is hindered by the absence of standardized evaluation benchmarks. To address this, we present ManCC—the first task-anchored benchmark for Manchu–Classical Chinese translation. ManCC consists of a high-quality parallel corpus of 16,627 sentence pairs, derived from the Qing-dynasty historical text Manwen Laodang-Taizong, and a reproducible evaluation protocol that combines automatic metrics (BLEU and chrF) with a three-dimensional human assessment (fidelity, fluency, linguistic normativity). Through systematic evaluation across three model families (non-pretrained, multilingual pretrained, and large language models), we find that linguistic differences significantly influence performance, broader language coverage in multilingual pretraining facilitates low-resource transfer, and automatic metrics often fail to capture essential errors in historical translation—underscoring the necessity of human evaluation. ManCC not only provides foundational resources for Manchu–Classical Chinese translation but also establishes a diagnosable, reproducible platform for cross-lingual modeling of historical low-resource languages.

pdf bib abs

Large language models have shown strong reasoning capabilities through chain-structured methods such as Chain-of-Thought. Recent studies optimize thought structures by generating parallel or tree-like structures, switching long and short reasoning modes, or aligning reasoning steps with task performance. However, these approaches mainly rely on previously generated logical directions of the chains, which ignore the unexplored regions of the solution space. Such a phenomenon is denoted as blind spots, which limit the diversity and effectiveness of the reasoning process. To this end, we propose the “Thought Space Explorer” (TSE), a framework for navigating and expanding thought structures to overcome blind spots in LLM reasoning. Our TSE first identifies key nodes with high impact, then generates new nodes by integrating information from multiple chains. Finally, it extends new branches through connection strategies. We conduct a series of experiments on math and QA benchmarks. Compared to existing baseline methods, TSE improves the accuracy of both the final answer and intermediate reasoning steps, while maintaining a better effectiveness-efficiency trade-off for practical deployment.

pdf bib abs

Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation
Mohsen Nayebi Kerdabadi | Arya Hadizadeh Moghaddam | Chen Chen | Dongjie Wang | Zijun Yao
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In electronic health record (EHR) mining, learning high-quality representations of medical concepts (e.g., standardized diagnosis, medication, and procedure codes) is fundamental for downstream clinical prediction. However, robust concept representation learning is hindered by two key challenges: (i) clinically important cross-type dependencies (e.g., diagnosis-medication and medication-procedure relations) are often missing or incomplete in existing ontology resources, limiting the ability to model complex EHR patterns; and (ii) rich clinical semantics are often missing from structured resources, and even when available as text, are difficult to integrate with KG structure for representation learning. To address these challenges, we present MedCo, an LLM-empowered graph learning framework for medical concept representation. MedCo first builds a global knowledge graph (KG) over medical codes by combining statistically reliable associations mined from EHRs with type-constrained LLM prompting to infer semantic relations. It then utilizes LLMs to enrich the KG into a text-attributed graph by generating node descriptions and edge rationales, providing semantic signals for both concepts and their relationships. Finally, MedCo jointly trains a LoRA-tuned LLaMA text encoder with a heterogeneous GNN, fusing text semantics and graph structure into unified concept embeddings. Extensive experiments on MIMIC-III and MIMIC-IV show that MedCo consistently improves prediction performance and serves as an effective plug-in concept encoder for standard EHR pipelines.

pdf bib abs

RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models
Arya Hadizadeh Moghaddam | Drew Ross | Mohsen Nayebi Kerdabadi | Dongjie Wang | Zijun Yao
Findings of the Association for Computational Linguistics: ACL 2026

Large Language Models (LLMs) have shown strong promise for mining Electronic Health Records (EHRs) by reasoning over longitudinal clinical information to capture context-rich patient trajectories. However, leveraging LLMs for structured EHRs (e.g., standardized diagnosis and medication codes) presents two key challenges. First, translating time-stamped EHR sequences into plain text can obscure both temporal structure and code identities, weakening the ability to capture code co-occurrence and longitudinal regularities. Second, unlike cohort-trained predictive models that learn a shared, task-aligned representation space across patients, LLMs are often applied in a case-isolated inference setting where each patient is processed independently without leveraging population-level patterns. To address these challenges, we introduce RePrompT, a time-aware LLM framework that integrates structured EHR encoders through prompt tuning, without modifying underlying architectures. Specifically, RePrompT recurrently incorporates latent states from prior visits to preserve longitudinal information, and injects population-level information through trainable prompt tokens derived from a cohort-trained, task-aligned EHR encoder. Experiments on MIMIC-III and MIMIC-IV demonstrate that RePrompT consistently outperforms both EHR-based and LLM-based baselines across multiple clinical prediction tasks.

Co-authors

Venues

Findings3
ACL1

Fix author