Yu Zhang
Other people with similar names: Yu Zhang , Yu Zhang , Yu Zhang , Yu Zhang , Yu Zhang , Yu Zhang
2025
CrossQG: Improving Difficulty-Controllable Question Generation through Consistency Enhancement
Kunze Li
|
Yu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Automatically generating questions with controlled difficulty has great application value, especially in the field of education. Although large language models are capable of generating questions of various difficulty levels, the generated questions often fail to align with the given target difficulty. To mitigate this issue, we propose CrossQG, a novel question generation method that requires no tuning of generator parameters, yet significantly improves difficulty consistency. Specifically, CrossQG consists of two steps: (1) contrast enhancement, which leverages questions from different difficulty levels to enhance the base models’ understanding of the target difficulty, and (2) cross filtering, which compares generated questions across different difficulty levels and filters out those that do not meet the target difficulty. We evaluate CrossQG on three high-quality question answering datasets. Experimental results demonstrate that across multiple models, CrossQG significantly outperforms several mainstream methods, achieving superior consistency with target difficulty and improving question quality. Notably, without generator training, CrossQG surpasses supervised fine-tuning in various instances.
DocAssistant: Integrating Key-region Reading and Step-wise Reasoning for Robust Document Visual Question Answering
Jinxu Zhang
|
Qiyuan Fan
|
Yu Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Understanding the multimodal documents is essential for accurately extracting relevant evidence and using it for reasoning. Existing document understanding models struggle to focus on key information and tend to generate answers straightforwardly, ignoring evidence from source documents and lacking interpretability. In this work, we improve the visual encoder to focus on key information relevant to the question and address the shortcomings of existing document visual question-answering datasets to provide the model with the ability to answer questions step-wise, dubbed DocAssistant. Specifically, for the visual side, we propose an effective vision-language adaptation that fuses text into visual encoders without compromising the performance of the original model. For the language side, we use Multimodal Large Language Models (MLLMs) as data generators and checkers to produce high-quality step-wise question-and-answer pairs for document images. We then use the generated high-quality data to train our enhanced model, specifically designed to solve complex questions that require reasoning or multi-hop question answering. The experimental results demonstrate the effectiveness of the model.