Xinyu Zhang
Other people with similar names: Xinyu Zhang, Xinyu Zhang, Xinyu Zhang, Xinyu Zhang, Xinyu Zhang (Southeast University)
Unverified author pages with similar names: Xinyu Zhang
2026
Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving
Xinyu Zhang | Yuchen Wan | Boxuan Zhang | Zesheng Yang | Lingling Zhang | Bifan Wei | Jun Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xinyu Zhang | Yuchen Wan | Boxuan Zhang | Zesheng Yang | Lingling Zhang | Bifan Wei | Jun Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) often struggle with structural ambiguity in optimization problems, where a single problem admits multiple related but conflicting modeling paradigms, hindering effective solution generation. To address this, we propose Dual-Cluster Memory Agent (DCM-Agent) to enhance performance by leveraging historical solutions in a training-free manner. Central to this is Dual-Cluster Memory Construction. This agent assigns historical solutions to modeling and coding clusters, then distills each cluster’s content into three structured types: Approach, Checklist, and Pitfall. This process derives generalizable guidance knowledge. Furthermore, this agent introduces Memory-augmented Inference to dynamically navigate solution paths, detect and repair errors, and adaptively switch reasoning paths with structured knowledge. The experiments across seven optimization benchmarks demonstrate that DCM-Agent achieves an average performance improvement of 11%- 21%. Notably, our analysis reveals a “knowledge inheritance” phenomenon: memory constructed by larger models can guide smaller models toward superior performance, highlighting the framework’s scalability and efficiency.
2025
Diagram-Driven Course Questions Generation
Xinyu Zhang | Lingling Zhang | Yanrui Wu | Muye Huang | Wenjun Wu | Bo Li | Shaowei Wang | Basura Fernando | Jun Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xinyu Zhang | Lingling Zhang | Yanrui Wu | Muye Huang | Wenjun Wu | Bo Li | Shaowei Wang | Basura Fernando | Jun Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Visual Question Generation (VQG) research focuses predominantly on natural images while neglecting the diagram, which is a critical component in educational materials. To meet the needs of pedagogical assessment, we propose the Diagram-Driven Course Questions Generation (DDCQG) task and construct DiagramQG, a comprehensive dataset with 15,720 diagrams and 25,798 questions across 37 subjects and 371 courses. Our approach employs course and input text constraints to generate course-relevant questions about specific diagram elements. We reveal three challenges of DDCQG: domain-specific knowledge requirements across courses, long-tail distribution in course coverage, and high information density in diagrams. To address these, we propose the Hierarchical Knowledge Integration framework (HKI-DDCQG), which utilizes trainable CLIP for identifying relevant diagram patches, leverages frozen vision-language models for knowledge extraction, and generates questions with trainable T5. Experiments demonstrate that HKI-DDCQG outperforms existing models on DiagramQG while maintaining strong generalizability across natural image datasets, establishing a strong baseline for DDCQG.
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Xinyu Zhang | Yuxuan Dong | Yanrui Wu | Jiaxing Huang | Chengyou Jia | Basura Fernando | Mike Zheng Shou | Lingling Zhang | Jun Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xinyu Zhang | Yuxuan Dong | Yanrui Wu | Jiaxing Huang | Chengyou Jia | Basura Fernando | Mike Zheng Shou | Lingling Zhang | Jun Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models demonstrate remarkable capabilities across various domains, especially mathematics and logic reasoning. However, current evaluations overlook physics-based reasoning - a complex task requiring physics theorems and constraints. We present PhysReason, a 1,200-problem benchmark comprising knowledge-based (25%) and reasoning-based (75%) problems, where the latter are divided into three difficulty levels (easy, medium, hard). Notably, problems require an average of 8.1 solution steps, with hard requiring 15.6, reflecting the complexity of physics-based reasoning. We propose the Physics Solution Auto Scoring Framework, incorporating efficient answer-level and comprehensive step-level evaluations. Top-performing models like Deepseek-R1, Gemini-2.0-Flash-Thinking, and o3-mini-high achieve less than 60% on answer-level evaluation, with performance dropping from knowledge questions (75.11%) to hard problems (31.95%). Through step-level evaluation, we identified four key bottlenecks: Physics Theorem Application, Physics Process Understanding, Calculation, and Physics Condition Analysis. These findings position PhysReason as a novel and comprehensive benchmark for evaluating physics-based reasoning capabilities in large language models.