2025
pdf
bib
abs
Training LLMs for Optimization Modeling via Iterative Data Synthesis and Structured Validation
Yang Wu
|
Yifan Zhang
|
Yurong Wu
|
Yuran Wang
|
Junkai Zhang
|
Jian Cheng
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models (LLMs) have revolutionized various domains but encounter substantial challenges in tackling optimization modeling tasks for Operations Research (OR), particularly when dealing with complex problem. In this work, we propose Step-Opt-Instruct, a framework that augments existing datasets and generates high-quality fine-tuning data tailored to optimization modeling. Step-Opt-Instruct employs iterative problem generation to systematically increase problem complexity and stepwise validation to rigorously verify data, preventing error propagation and ensuring the quality of the generated dataset. Leveraging this framework, we fine-tune open-source LLMs, including LLaMA-3-8B and Mistral-7B, to develop Step-Opt—a model that achieves state-of-the-art performance on benchmarks such as NL4OPT, MAMO, and IndustryOR. Extensive experiments demonstrate the superior performance of Step-Opt, especially in addressing complex OR tasks, with a notable 17.01% improvement in micro average accuracy on difficult problems. These findings highlight the effectiveness of combining structured validation with gradual problem refinement to advance the automation of decision-making processes using LLMs. The code and dataset are available at https://github.com/samwu-learn/Step.
pdf
bib
abs
Protein Large Language Models: A Comprehensive Survey
Yijia Xiao
|
Wanjia Zhao
|
Junkai Zhang
|
Yiqiao Jin
|
Han Zhang
|
Zhicheng Ren
|
Renliang Sun
|
Haixin Wang
|
Guancheng Wan
|
Pan Lu
|
Xiao Luo
|
Yu Zhang
|
James Zou
|
Yizhou Sun
|
Wei Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Protein-specific large language models (ProteinLLMs) are revolutionizing protein science by enabling more efficient protein structure prediction, function annotation, and design. While existing surveys focus on specific aspects or applications, this work provides the first comprehensive overview of ProteinLLMs, covering their architectures, training datasets, evaluation metrics, and diverse applications. Through a systematic analysis of over 100 articles, we propose a structured taxonomy of state-of-the-art ProteinLLMs, analyze how they leverage large-scale protein sequence data for improved accuracy, and explore their potential in advancing protein engineering and biomedical research. Additionally, we discuss key challenges and future directions, positioning ProteinLLMs as essential tools for scientific discovery in protein science. Resources are maintained at https://github.com/Yijia-Xiao/Protein-LLM-Survey.
pdf
bib
abs
MetaScientist: A Human-AI Synergistic Framework for Automated Mechanical Metamaterial Design
Jingyuan Qi
|
Zian Jia
|
Minqian Liu
|
Wangzhi Zhan
|
Junkai Zhang
|
Xiaofei Wen
|
Jingru Gan
|
Jianpeng Chen
|
Qin Liu
|
Mingyu Derek Ma
|
Bangzheng Li
|
Haohui Wang
|
Adithya Kulkarni
|
Muhao Chen
|
Dawei Zhou
|
Ling Li
|
Wei Wang
|
Lifu Huang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
The discovery of novel mechanical metamaterials, whose properties are dominated by their engineered structures rather than chemical composition, is a knowledge-intensive and resource-demanding process. To accelerate the design of novel metamaterials, we present MetaScientist, a human-in-the-loop system that integrates advanced AI capabilities with expert oversight with two primary phases: (1) hypothesis generation, where the system performs complex reasoning to generate novel and scientifically sound hypotheses, supported with domain-specific foundation models and inductive biases retrieved from existing literature; (2) 3D structure synthesis, where a 3D structure is synthesized with a novel 3D diffusion model based on the textual hypothesis and refined it with a LLM-based refinement model to achieve better structure properties. At each phase, domain experts iteratively validate the system outputs, and provide feedback and supplementary materials to ensure the alignment of the outputs with scientific principles and human preferences. Through extensive evaluation from human scientists, MetaScientist is able to deliver novel and valid mechanical metamaterial designs that have the potential to be highly impactful in the metamaterial field.