Xiao Hu

2025

pdf bib abs
GeLLM³O: Generalizing Large Language Models for Multi-property Molecule Optimization
Vishal Dey | Xiao Hu | Xia Ning
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite recent advancements, most computational methods for molecule optimization are constrained to single- or double-property optimization tasks and suffer from poor scalability and generalizability to novel optimization tasks. Meanwhile, Large Language Models (LLMs) demonstrate remarkable out-of-domain generalizability to novel tasks. To demonstrate LLMs’ potential for molecule optimization, we introduce \mathtt{MuMOInstruct}, the first high-quality instruction-tuning dataset specifically focused on multi-property molecule optimization tasks. Leveraging \mathtt{MuMOInstruct}, we develop \mathtt{GeLLM^3O}s, a series of instruction-tuned LLMs for molecule optimization. Extensive evaluations across 5 in-domain and 5 out-of-domain tasks demonstrate that \mathtt{GeLLM^3O}s consistently outperform state-of-the-art baselines. \mathtt{GeLLM^3O}s also exhibit outstanding zero-shot generalization to unseen tasks, significantly outperforming powerful closed-source LLMs. Such strong generalizability demonstrates the tremendous potential of \mathtt{GeLLM^3O}s as foundational models for molecule optimization, thereby tackling novel optimization tasks without resource-intensive retraining. \mathtt{MuMOInstruct} and code are accessible through https://github.com/ninglab/GeLLMO.

pdf bib abs
Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization
Vishal Dey | Xiao Hu | Xia Ning
Findings of the Association for Computational Linguistics: EMNLP 2025

In real-world drug design, molecule optimization requires selectively improving multiple molecular properties up to pharmaceutically relevant levels, while maintaining others that already meet such criteria. However, existing computational approaches and instruction-tuned LLMs fail to capture such nuanced property-specific objectives, limiting their practical applicability. To address this, we introduce C-MuMOInstruct, the first instruction-tuning dataset focused on multi-property optimization with explicit, property-specific objectives. Leveraging C-MuMOInstruct, we develop \mathtt{GeLLM^4O\text{-}C}s, a series of instruction-tuned LLMs that can perform targeted property-specific optimization. Our experiments across 5 in-distribution and 5 out-of-distribution tasks show that \mathtt{GeLLM^4O\text{-}C}s consistently outperform strong baselines, achieving up to 126% higher success rate. Notably, \mathtt{GeLLM^4O\text{-}C}s exhibit impressive 0-shot generalization to novel optimization tasks and unseen instructions. This offers a step toward a foundational LLM to support realistic, diverse optimizations with property-specific objectives. C-MuMOInstruct and code are accessible through https://github.com/ninglab/GeLLMO-C.

2021

pdf bib abs
IFlyEA: A Chinese Essay Assessment System with Automated Rating, Review Generation, and Recommendation
Jiefu Gong | Xiao Hu | Wei Song | Ruiji Fu | Zhichao Sheng | Bo Zhu | Shijin Wang | Ting Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

Automated Essay Assessment (AEA) aims to judge students’ writing proficiency in an automatic way. This paper presents a Chinese AEA system IFlyEssayAssess (IFlyEA), targeting on evaluating essays written by native Chinese students from primary and junior schools. IFlyEA provides multi-level and multi-dimension analytical modules for essay assessment. It has state-of-the-art grammar level analysis techniques, and also integrates components for rhetoric and discourse level analysis, which are important for evaluating native speakers’ writing ability, but still challenging and less studied in previous work. Based on the comprehensive analysis, IFlyEA provides application services for essay scoring, review generation, recommendation, and explainable analytical visualization. These services can benefit both teachers and students during the process of writing teaching and learning.

2020

Grammatical error diagnosis is an important task in natural language processing. This paper introduces our system at NLPTEA-2020 Task: Chinese Grammatical Error Diagnosis (CGED). CGED aims to diagnose four types of grammatical errors which are missing words (M), redundant words (R), bad word selection (S) and disordered words (W). Our system is built on the model of multi-layer bidirectional transformer encoder and ResNet is integrated into the encoder to improve the performance. We also explore two ensemble strategies including weighted averaging and stepwise ensemble selection from libraries of models to improve the performance of single model. In official evaluation, our system obtains the highest F1 scores at identification level and position level. We also recommend error corrections for specific error types and achieve the second highest F1 score at correction level.

2018

Simile is a special type of metaphor, where comparators such as like and as are used to compare two objects. Simile recognition is to recognize simile sentences and extract simile components, i.e., the tenor and the vehicle. This paper presents a study of simile recognition in Chinese. We construct an annotated corpus for this research, which consists of 11.3k sentences that contain a comparator. We propose a neural network framework for jointly optimizing three tasks: simile sentence classification, simile component extraction and language modeling. The experimental results show that the neural network based approaches can outperform all rule-based and feature-based baselines. Both simile sentence classification and simile component extraction can benefit from multitask learning. The former can be solved very well, while the latter is more difficult.