Sungwoo Han

2026

ConRAS: Contrastive In-context Learning Framework for Retrieval-Augmented Summarization
Juseon Do | Sungwoo Han | Jingun Kwon | Hidetaka Kamigaito | Manabu Okumura
Findings of the Association for Computational Linguistics: EACL 2026

Contrastive learning (CL) has achieved remarkable progress in natural language processing (NLP), primarily as a paradigm for pre-training and fine-tuning. However, its potential during the generation phase, particularly in in-context learning (ICL)-based retrieval-augmented summarization, remains largely unexplored. While previous studies have attempted to incorporate negative samples into ICL prompts, these methods do not enforce a true contrastive objective that encourages separation of positive and negative samples in the representation space. In this paper, we first demonstrate through preliminary experiments that small language models (SLMs) can interpret contrastive prompts and effectively distinguish between positive and negative samples during inference, without any parameter updates. Building on these findings, we propose ConRAS, a novel framework that injects contrastive objectives into ICL-based retrieval-augmented summarization. Extensive experiments and in-depth analysis on three summarization benchmarks using four SLMs show that ConRAS consistently outperforms state-of-the-art retrieval-augmented methods, achieving significant improvements in summary quality.

pdf bib abs

Ranking is a fundamental component in a wide range of AI applications. However, large language models (LLMs) remain unstable on long-context ranking. Sliding-window processing is costly and listwise prompting over full candidates still yields inconsistent orders. We show that sampling alone, even with selection-based methods, cannot stabilize ranking because LLM consistency decomposes into within-list order and cross-list preference, in which a single stochastic process cannot align. To address this, we introduce Self-Sorting (SS), which generates m candidate lists and performs n selection-time re-rankings over those lists. SS fuses explicit within-list positions with implicit cross-list preferences to score entities and return a top-k set. Experimental results on five widely used ranking benchmarks show significant improvements in nDCG@1,5,10, highlighting the critical role of implicit consistency.

pdf bib abs

Measuring Watermarking under Jailbreaking: ASR Inflation and Goal-Compliance Mismatch
Sungwoo Han | Sangjun Moon | Jingun Kwon | Hidetaka Kamigaito | Manabu Okumura
Findings of the Association for Computational Linguistics: ACL 2026

Recently, watermarking has attracted growing attention as a practical technique for source attribution of machine-generated text. However, most prior work studies watermarking under benign prompts, while its behavior under jailbreaking prompts remains underexplored. This gap matters because jailbreaking can bypass safety policies and shift the generation regime, raising concerns that watermarking may interact with model alignment under attack. To address this gap, we evaluate six watermarking methods on four LLMs across two jailbreak benchmarks and three settings: Static, AutoDAN, and DSN. We find that watermarking can inflate judge-based attack success rate, denoted ASR, under jailbreaking, with the largest effects appearing in biased schemes that perturb logits. At the same time, these ASR increases often do not reflect higher harmful-goal compliance when measured by StrongREJECT or by human judgments. This suggests that ASR-only evaluations can be brittle to decoding perturbations and may overestimate harmful-goal compliance, motivating complementary goal-compliance metrics (e.g., StrongREJECT) and human evaluations.

Co-authors

Sangjun Moon 1

Taro Watanabe 1

Venues

Findings3

Fix author