Hanmeng Liu


2023

pdf
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-Distribution Generalization Perspective
Linyi Yang | Shuibai Zhang | Libo Qin | Yafu Li | Yidong Wang | Hanmeng Liu | Jindong Wang | Xing Xie | Yue Zhang
Findings of the Association for Computational Linguistics: ACL 2023

Pre-trained language models (PLMs) are known to improve the generalization performance of natural language understanding models by leveraging large amounts of data during the pre-training phase. However, the out-of-distribution (OOD) generalization problem remains a challenge in many NLP tasks, limiting the real-world deployment of these methods. This paper presents the first attempt at creating a unified benchmark named GLUE-X for evaluating OOD robustness in NLP models, highlighting the importance of OOD robustness and providing insights on how to measure the robustness of a model and how to improve it. The benchmark includes 13 publicly available datasets for OOD testing, and evaluations are conducted on 8 classic NLP tasks over 21 popularly used PLMs. Our findings confirm the need for improved OOD accuracy in NLP tasks, as significant performance degradation was observed in all settings compared to in-distribution (ID) accuracy.

pdf
LogiCoT: Logical Chain-of-Thought Instruction Tuning
Hanmeng Liu | Zhiyang Teng | Leyang Cui | Chaoli Zhang | Qiji Zhou | Yue Zhang
Findings of the Association for Computational Linguistics: EMNLP 2023

Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of helping the model handle complex reasoning tasks. To bridge the gap, this paper presents LogiCoT, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4. We elaborate on the process of harvesting instructions for prompting GPT-4 to generate chain-of-thought rationales. LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.

2021

pdf
Solving Aspect Category Sentiment Analysis as a Text Generation Task
Jian Liu | Zhiyang Teng | Leyang Cui | Hanmeng Liu | Yue Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Aspect category sentiment analysis has attracted increasing research attention. The dominant methods make use of pre-trained language models by learning effective aspect category-specific representations, and adding specific output layers to its pre-trained representation. We consider a more direct way of making use of pre-trained language models, by casting the ACSA tasks into natural language generation tasks, using natural language sentences to represent the output. Our method allows more direct use of pre-trained knowledge in seq2seq language models by directly following the task setting during pre-training. Experiments on several benchmarks show that our method gives the best reported results, having large advantages in few-shot and zero-shot settings.