Xinmeng Hou
2026
Learn Like Humans: Use Meta-cognitive Reflection for Efficient Self-Improvement
Xinmeng Hou | Bohao Qu | Wuqi Wang | Peiliang Gong | Qing Guo | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xinmeng Hou | Bohao Qu | Wuqi Wang | Peiliang Gong | Qing Guo | Yang Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While Large Language Models (LLMs) enable complex autonomous behavior, current agents remain constrained by static, human-designed prompts that limit adaptability. Existing self-improving frameworks attempt to bridge this gap but typically rely on inefficient, multi-turn recursive loops that incur high computational costs. To address this, we propose Metacognitive Agent with Reflective Self-improvement (MARS), a framework that achieves efficient self-evolution within a single recurrence cycle. Inspired by educational psychology, MARS mimics human learning by integrating principle-based reflection (abstracting normative rules to avoid errors) and procedural reflection (deriving step-by-step strategies for success). By synthesizing these insights into optimized instructions, MARS allows agents to systematically refine their reasoning logic without continuous online feedback. Extensive experiments on six benchmarks demonstrate that MARS outperforms state-of-the-art self-evolving systems while significantly reducing computational overhead. Code is available at https://github.com/Paparare/MARS/tree/main
2025
Train Once for All: A Transitional Approach for Efficient Aspect Sentiment Triplet Extraction
Xinmeng Hou | Lingyue Fu | Chenhao Meng | Kounianhua Du | Hai Hu
Findings of the Association for Computational Linguistics: EMNLP 2025
Xinmeng Hou | Lingyue Fu | Chenhao Meng | Kounianhua Du | Hai Hu
Findings of the Association for Computational Linguistics: EMNLP 2025
Aspect-Opinion Pair Extraction (AOPE) and Aspect Sentiment Triplet Extraction (ASTE) have drawn growing attention in NLP. However, most existing approaches extract aspects and opinions independently, optionally adding pairwise relations, often leading to error propagation and high time complexity. To address these challenges and being inspired by transition-based dependency parsing, we propose the first transition-based model for AOPE and ASTE that performs aspect and opinion extraction jointly, which also better captures position-aware aspect-opinion relations and mitigates entity-level bias. By integrating contrastive-augmented optimization, our model delivers more accurate action predictions and jointly optimizes separate subtasks in linear time. Extensive experiments on four commonly used ASTE/AOPE datasets show that, our proposed transition-based model outperform previous models on two out of the four datasets when trained on a single dataset. When multiple training sets are used, our proposed method achieves new state-of-the-art results on all datasets. We show that this is partly due to our model’s ability to benefit from transition actions learned from multiple datasets and domains.Our code is available at https://github.com/Paparare/trans_aste.
2024
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language
Xinmeng Hou
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Xinmeng Hou
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
This study introduces a prescriptive annotation benchmark grounded in humanities research to ensure consistent, unbiased labeling of offensive language, particularly for casual and non-mainstream language uses. We contribute two newly annotated datasets that achieve higher inter-annotator agreement between human and language model (LLM) annotations compared to original datasets based on descriptive instructions. Our experiments show that LLMs can serve as effective alternatives when professional annotators are unavailable. Moreover, smaller models fine-tuned on multi-source LLM-annotated data outperform models trained on larger, single-source human-annotated datasets. These findings highlight the value of structured guidelines in reducing subjective variability, maintaining performance with limited data, and embracing language diversity. Content Warning: This article only analyzes offensive language for academic purposes. Discretion is advised.