Minkyu Kim

Also published as: MinKyu Kim

2026

Continual pre-training (CPT) has been widely adopted as a method for domain expansion in large language models. However, CPT has consistently been accompanied by challenges, such as the difficulty of acquiring large-scale domain-specific datasets and high computational costs. In this study, we propose a novel method called Test-Enhanced Learning for Language Model Enrichment (TELLME) to alleviate these issues. TELLME leverages the Test-Enhanced Learning (TEL) principle, whereby the model’s learning efficiency is improved using quizzes during training. It integrates this principle with CPT, thereby promoting efficient domain-specific knowledge acquisition and long-term memory retention. Experimental results demonstrate that TELLME outperforms existing methods by up to 23.6% in the financial domain and achieves a 9.8% improvement in long-term memory retention.

pdf bib abs

Clinical dialogue-to-note generation is challenging because clinically salient evidence is noisy, distributed across turns, and often revised later in the encounter. Direct transcript-only prompting and coarse intermediate scaffolds can therefore suffer from omissions, section leakage, unsupported fill-in, and brittle final-state tracking. We propose Clinical Atomic Propositions (CAPs), a dialogue-aware intermediate representation for faithful clinical note generation. CAPs extract source-grounded clinical assertions while preserving modifiers such as verification status, temporality, speaker/source, and action type. We also study an optional event consolidation layer that groups CAPs into problem-oriented care bundles before note rendering. We evaluate five methods on a 197-case ACI-Bench cohort: a transcript-only baseline, prompt-based reimplementations of Cluster2Sent and MEDSUM-ENT, CAP, and CAP+Event. The main task uses a sectioned-note template, with SOAP-template rendering and transcript-free rendering reported as ablations. We use MEDSUM-ENT-style GPT-R/P/F1 metrics and a proposition-grounded semCAP-R/P/F1 audit to measure concept-level and source-grounded faithfulness, complemented by case-level win/tie/loss analysis and clinician deep review. Results show that CAP improves preservation of transcript-grounded clinical propositions while remaining competitive on concept-level GPT metrics. CAP+Event is not uniformly better than CAP, but qualitative and boundary analyses show when problem-oriented consolidation can improve organization and when compression can introduce omissions. We release code, prompts, intermediate representations, generated notes, and evaluation artifacts at a public repository.

Co-authors

Venues

Fix author