Kejia Chen

2026

Trident: Self-Supervised Preference Alignment via Triplet Regularization
Yingnan Guo | Kejia Chen | Xiaofeng Zhang | Zifei Wu | Yu Zhang
Findings of the Association for Computational Linguistics: ACL 2026

Aligning Large Vision-Language Models (LVLMs) to mitigate hallucinations typically relies on high-quality preference data. However, in self-supervised settings, standard binary preference optimization (e.g., DPO) suffers from noisy supervision and semantic ambiguity, as automatically generated chosen responses are not guaranteed to be superior to rejected ones. In this work, we propose Trident, a fully self-supervised framework that ensures robust alignment via a structured triplet paradigm. Trident autonomously constructs reliable preference triplets—comprising semantically enriched (chosen), degraded (rejected), and neutral (anchor) responses—through automated visual perturbations and self-summarization. We further introduce Trident Preference Regularization (TPR), a novel objective that utilizes an adaptive margin to enforce semantic separation between the triplet components while preventing deviation from the pretrained distribution. Despite requiring no human annotations or external reward models, Trident consistently outperforms state-of-the-art RLHF and RLAIF baselines. For instance, on LLaVA-1.5-7B, it reduces the hallucination rate on AMBER to 11.3% and achieves 95.70% precision on POPE using only 4k self-generated triplets and a single epoch. This validates structured triplet supervision as a scalable paradigm for robust self-supervised alignment.

pdf bib abs

Vision-Language Models (VLMs) often prioritize linguistic fluency over visual fidelity, leading to hallucinations where generated text contradicts the image. Countering this bias typically requires resource-heavy fine-tuning or high-latency verification methods that provide feedback only after the full response is generated. To overcome these limitations, we present a framework for Token-level Inference-Time Alignment (TITA) that steers the decoding process without updating the base model parameters. By training a lightweight reward model to capture visual preferences, TITA extracts implicit guidance through log-probability ratios. This approach functions as an inference-time adaptation of Direct Preference Optimization (DPO), injecting dense feedback to correct the output distribution at every generation step. Across diverse architectures including LLaVA-1.5, Qwen3-VL, and InternVL3.5, TITA consistently improves performance on 13 benchmarks. For example, TITA boosts LLaVA-1.5-7B by 8.6% on MMVet and achieves a 74.0 MMStar score with Qwen3-VL-8B. Specifically, these gains incur negligible overhead (~0.2s per query), offering a superior trade-off between alignment effectiveness and efficiency. Our code is available at: https://github.com/Thecommonirin/TITA.

2022

pdf bib abs

Comparative Graph-based Summarization of Scientific Papers Guided by Comparative Citations
Jingqiang Chen | Chaoxiang Cai | Xiaorui Jiang | Kejia Chen
Proceedings of the 29th International Conference on Computational Linguistics

With the rapid growth of scientific papers, understanding the changes and trends in a research area is rather time-consuming. The first challenge is to find related and comparable articles for the research. Comparative citations compare co-cited papers in a citation sentence and can serve as good guidance for researchers to track a research area. We thus go through comparative citations to find comparable objects and build a comparative scientific summarization corpus (CSSC). And then, we propose the comparative graph-based summarization (CGSUM) method to create comparative summaries using citations as guidance. The comparative graph is constructed using sentences as nodes and three different relationships of sentences as edges. The relationship that sentences occur in the same paper is used to calculate the salience of sentences, the relationship that sentences occur in two different papers is used to calculate the difference between sentences, and the relationship that sentences are related to citations is used to calculate the commonality of sentences. Experiments show that CGSUM outperforms comparative baselines on CSSC and performs well on DUC2006 and DUC2007.

Co-authors

Venues

Findings2
COLING1

Fix author