Jie Zhao
Other people with similar names: Jie Zhao , Jie Zhao , Jie Zhao
2025
Controllable Style Arithmetic with Language Models
Weiqi Wang
|
Wengang Zhou
|
Zongmeng Zhang
|
Jie Zhao
|
Houqiang Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language models have shown remarkable capabilities in text generation, but precisely controlling their linguistic style remains challenging. Existing methods either lack fine-grained control, require extensive computation, or introduce significant latency. We propose Style Arithmetic (SA), a novel parameter-space approach that first extracts style-specific representations by analyzing parameter differences between models trained on contrasting styles, then incorporates these representations into a base model with precise control over style intensity. Our experiments show that SA achieves three key capabilities: controllability for precise adjustment of styles, transferability for effective style transfer across tasks, and composability for simultaneous control of multiple style dimensions. Compared to alternative methods, SA offers superior effectiveness while achieving optimal computational efficiency. Our approach opens new possibilities for flexible and efficient style control in language models.
Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation
Yu Zeng
|
Yukun Qi
|
Yiming Zhao
|
Xikun Bao
|
Lin Chen
|
Zehui Chen
|
Shiting Huang
|
Jie Zhao
|
Feng Zhao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
High-quality image captions are essential for improving modality alignment and visual understanding in Large Vision-Language Models (LVLMs). However, the scarcity of ultra-detailed image caption data limits further advancements. This paper presents a systematic pipeline for generating high-quality, ultra-detailed image captions, encompassing both pre-processing and post-processing stages. In the pre-processing stage, we classify and deduplicate images, extract visual information using expert tools, and leverage GPT-4o with structured prompts to generate initial captions. To enhance comprehensiveness, we introduce an expansion strategy based on Large Language Models (LLMs), defining eight descriptive dimensions to refine and extend captions, which serve as seed data for training a proprietary captioner model. In the post-processing stage, we incorporate human error-correction annotations and an active learning-inspired approach to refine low-quality samples. Using high-quality corrected data, we apply Direct Preference Optimization (DPO) and develop a critic-rewrite pipeline, training a sentence-level critic model to mitigate hallucinations. Experimental results demonstrate that our ultra-detailed captions significantly enhance LVLMs’ perception and cognitive abilities across multiple vision-language benchmarks. The code and dataset are available at https://github.com/yuzeng0-0/UltraCaption.
Search
Fix author
Co-authors
- Xikun Bao 1
- Lin Chen (陈霖) 1
- Zehui Chen 1
- Shiting Huang 1
- Houqiang Li 1
- show all...