Bing Xue

2026

Solver-Independent Automated Problem Formulation via LLMs for High-Cost Simulation-Driven Design
Yuchen Li | Handing Wang | Bing Xue | Mengjie Zhang | Yaochu Jin
Findings of the Association for Computational Linguistics: ACL 2026

In the high-cost simulation-driven design domain, translating ambiguous design requirements into a mathematical optimization formulation is a bottleneck for optimizing product performance. This process is time-consuming and heavily reliant on expert knowledge. While large language models (LLMs) offer potential for automating this task, existing approaches either suffer from poor formalization that fails to accurately align with the design intent or rely on solver feedback for data filtering, which is unavailable due to the high simulation costs. To address this challenge, we propose automated problem formulation (APF), a solver-independent framework that utilizes LLMs to convert engineers’ natural language requirements into executable optimization models. The core of this framework is an innovative pipeline for automatically generating high-quality data, which overcomes the difficulty of constructing suitable fine-tuning datasets in the absence of high-cost solver feedback with the help of data generation and test instance annotation. The generated high-quality dataset is used to perform supervised fine-tuning on LLMs, significantly enhancing their ability to generate accurate and executable optimization problem formulations. Experimental results on antenna design demonstrate that APF significantly outperforms the existing methods in both the accuracy of requirement formalization and the quality of resulting radiation efficiency curves in meeting the design goals.

2020

pdf bib abs

In Data We Trust: A Critical Analysis of Hate Speech Detection Datasets
Kosisochukwu Madukwe | Xiaoying Gao | Bing Xue
Proceedings of the Fourth Workshop on Online Abuse and Harms

Recently, a few studies have discussed the limitations of datasets collected for the task of detecting hate speech from different viewpoints. We intend to contribute to the conversation by providing a consolidated overview of these issues pertaining to the data that debilitate research in this area. Specifically, we discuss how the varying pre-processing steps and the format for making data publicly available result in highly varying datasets that make an objective comparison between studies difficult and unfair. There is currently no study (to the best of our knowledge) focused on comparing the attributes of existing datasets for hate speech detection, outlining their limitations and recommending approaches for future research. This work intends to fill that gap and become the one-stop shop for information regarding hate speech datasets.

Co-authors

Mengjie Zhang 1

Venues

ALW1
Findings1

Fix author