Meng Li (李梦) - ACL Anthology

Meng Li

Also published as: 梦李

2026

Agentic reinforcement learning enables large language models to solve long-horizon tasks by interacting with the environment and internalizing tool-use behavior into their reasoning. Prior work assigns supervision primarily based on outcome rewards or external reward models, but largely ignores environment observations, a critical source of learning. Consequently, agents may identify successful actions without understanding how the environment responds, producing suboptimal policies. To address this, we propose SOAR (Supervision from Observation for Agentic Reinforcement Learning), which assigns positive advantages to observation tokens proportional to the negative entropy of preceding actions. This encourages the agent to learn from outcomes of confident actions, grounding policy updates in environment dynamics and improving anticipation of tool-call consequences. Empirical results across three domains and 14 benchmarks show that SOAR improves performance, yielding gains of up to 7.0% on general reasoning tasks and 16.9% on deep research tasks, while reducing erroneous and inefficient tool usage.

pdf bib abs

PanoramaRAG: Enabling Consistent Global Topic Awareness in Graph-Based RAG
Ding Deng | Xiang Li | Yaqing Zhang | Meng Li | Xiting Wang
Findings of the Association for Computational Linguistics: ACL 2026

Graph-based Retrieval-Augmented Generation (RAG), which models relationships between fine-grained semantic units as a graph, effectively facilitates multi-hop reasoning to enhance large language model generation. However, its design focuses on local relationships, resulting in suboptimal performance for tasks that require global context, and the separation of query refinement from indexing limits the system’s ability to capture high-level implicit relationships within the graph. This paper proposes a **Panorama**-guided **RAG** paradigm (PanoramaRAG) that integrates a light yet comprehensive “panorama” of the corpus to guide all stages of the retrieval process. This hub bridges the knowledge graph, language models, and queries in a computationally efficient manner, applicable to both open-source and closed-source models. Experimental results demonstrate that our method exhibits strong performance across five datasets and a variety of tasks.

pdf bib abs

Parameter-efficient fine-tuning (PEFT) has become a prevalent approach for adapting large language models (LLMs). However, low-rank adaptation methods face an inherent trade-off: improving target task performance can compromise pre-trained world knowledge, while aggressively constraining updates to preserve world knowledge may hinder improvements in the target task. Furthermore, most current methods fail to account for layer-wise differences in adaptation sensitivity, resulting in suboptimal preservation of world knowledge and task adaptation. To address these challenge, we propose Fisher-Optimized Adaptive Low Rank and Singular-VectorSelection (FARSS), an effective framework for knowledge-preserving fine-tuning. This framework introduces two key innovations. First, we propose a Fisher-guided adaptive rank allocation strategy, which assigns smaller ranks to shallow layers that are critical for preserving world knowledge, and larger ranks to deep layers that are essential for task adaptation. Second, we introduce a task-aware initialization method that integrates singular value information with layer-specific second-order statistics estimated from activation and gradient covariances, enabling efficient and task-sensitive low-rank updates. We evaluated several models across various tasks, and the experimental results show that our approach outperforms existing PEFT methods, including LoRA, Corda, and KaSA, achieving a balance between preserving world knowledge and enhancing target task performance. The code is available at https://github.com/chenyehuang/FARSS.

2025

pdf bib abs

Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
Meng Li | Michael Vrazitulis | David Schlangen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Rational speakers are supposed to know what they know and what they do not know, and to generate expressions matching the strength of evidence. In contrast, it is still a challenge for current large language models to generate corresponding utterances based on the assessment of facts and confidence in an uncertain real-world environment. While it has recently become popular to estimate and calibrate confidence of LLMs with verbalized uncertainty, what is lacking is a careful examination of the linguistic knowledge of uncertainty encoded in the latent space of LLMs. In this paper, we draw on typological frameworks of epistemic expressions to evaluate LLMs’ knowledge of epistemic modality, using controlled stories. Our experiments show that the performance of LLMs in generating epistemic expressions is limited and not robust, and hence the expressions of uncertainty generated by LLMs are not always reliable. To build uncertainty-aware LLMs, it is necessary to enrich semantic knowledge of epistemic modality in LLMs.

pdf bib abs

Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Meng Li | Guangda Huzhang | Haibo Zhang | Xiting Wang | Anxiang Zeng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Direct Preference Optimization (DPO) has emerged as a promising framework for aligning Large Language Models (LLMs) with human preferences by directly optimizing the log-likelihood difference between chosen and rejected responses. However, existing methods assign equal importance to all tokens in the response, while humans focus on more meaningful parts. This leads to suboptimal preference optimization, as irrelevant or noisy tokens disproportionately influence DPO loss. To address this limitation, we propose Optimal Transport-based token weighting scheme for enhancing direct Preference Optimization (OTPO). By emphasizing semantically meaningful token pairs and de-emphasizing less relevant ones, our method introduces a context-aware token weighting scheme that yields a more contrastive reward difference estimate. This adaptive weighting enhances reward stability, improves interpretability, and ensures that preference optimization focuses on meaningful differences between responses. Extensive experiments have validated OTPO’s effectiveness in improving instruction-following ability across various settings.

pdf bib abs

Aligning Large Language Models (LLMs) with human values has attracted increasing attention since it provides clarity, transparency, and the ability to adapt to evolving scenarios. In this paper, we introduce a Controlled Value Vector Activation (ConVA) method that directly aligns the internal values of LLMs by interpreting how a value is encoded in their latent representations and modifies relevant activations to ensure consistent values in LLMs. To ensure an accurate and unbiased interpretation, we propose a context-controlled value vector identification method. To consistently control values without sacrificing model performance, we introduce a gated value vector activation method for effective and minimum degree of value control. Experiments show that our method achieves the highest control success rate across 10 basic values without hurting LLM performance and fluency, and ensures target values even with opposite and potentially malicious input prompts. Source code and data are available at https://github.com/hr-jin/ConVA.

2024

pdf bib abs

基于意合图语义理论的结构标注体系与资源建设∗(System and Resource Construction Based on the Semantic Theory of Chinese-Parataxis-Graph)
Mengxi Guo (郭梦溪) | Meng Li (李梦) | Endong Xun (荀恩东) | Gaoqi Rao (饶高琦) | Zhongyang Yu (于钟洋)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“意合图是一种以事件为中心的多层次语义表示方法,由事件结构与实体结构构成,通过多层次语义体系设计,实现对事件的多层次分析。本文细化并制定了意合图标注规范,采用分层分级的标注策略,在自主研发的在线标注系统中对新闻语料和国际中文教育阅读语料进行了意合图QNP标注工作。通过本次标注,验证了意合图体系的合理性和可标注性,并构建了意合图语义资源库。”

pdf bib abs

意合图:中文多层次语义表示方法∗(Parataxis Graph: Multi-level Semantic Representation Method for Chinese)
Mengxi Guo (郭梦溪) | Endong Xun (荀恩东) | Meng Li (李梦) | Gaoqi Rao (饶高琦)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)

“基于参数的语义表示虽取得成就,但符号化的语义表示仍具有不可忽视的意义。我们在语义学基础上,充分考虑符号化语义表示在NLP领域落地中的需求,提出了一种兼具通用性与扩展性的多层次语义表示方法——意合图。意合图以事件为核心,由事件结构与实体结构构成,通过多层次语义体系设计,提升与场景结合的能力,并力求对不同层级的语言单元作一贯式表示。在资源建设和相关分析实验中取得良好效果。本文将重点介绍意合图设计理念与多层次语义体系。”

pdf bib abs

“中文意合图是近年提出的中文语义表示方法。本次评测是首次基于意合图理论的语义分析评测,旨在探索面向意合图理论的语义计算方法,评估机器的语义分析能力。本次评测共有14支队伍报名,最终有7支队伍提交结果,其中有5支队伍提交技术报告与模型,均成功复现。在评测截止时间内,表现最好的队伍使用大语言模型LoRA微调方法获得了F1值为72.06%的成绩。在最终提交技术报告的5支队伍中,有4支队伍使用了大语言模型微调方法,在一定程度上表明了目前技术发展的趋势。”

pdf bib abs

With the growing popularity of general-purpose Large Language Models (LLMs), comes a need for more global explanations of model behaviors. Concept-based explanations arise as a promising avenue for explaining high-level patterns learned by LLMs. Yet their evaluation poses unique challenges, especially due to their non-local nature and high dimensional representation in a model’s hidden space. Current methods approach concepts from different perspectives, lacking a unified formalization. This makes evaluating the core measures of concepts, namely faithfulness or readability, challenging. To bridge the gap, we introduce a formal definition of concepts generalizing to diverse concept-based explanations’ settings. Based on this, we quantify the faithfulness of a concept explanation via perturbation. We ensure adequate perturbation in the high-dimensional space for different concepts via an optimization problem. Readability is approximated via an automatic and deterministic measure, quantifying the coherence of patterns that maximally activate a concept while aligning with human understanding. Finally, based on measurement theory, we apply a meta-evaluation method for evaluating these measures, generalizable to other types of explanations or tasks as well. Extensive experimental analysis has been conducted to inform the selection of explanation evaluation measures.

pdf bib abs

Weight-sharing supernets are crucial for performance estimation in cutting-edge neural architecture search (NAS) frameworks. Despite their ability to generate diverse subnetworks without retraining, the quality of these subnetworks is not guaranteed due to weight sharing. In NLP tasks like machine translation and pre-trained language modeling, there is a significant performance gap between supernet and training from scratch for the same model architecture, necessitating retraining post optimal architecture identification.This study introduces a solution called mixture-of-supernets, a generalized supernet formulation leveraging mixture-of-experts (MoE) to enhance supernet model expressiveness with minimal training overhead. Unlike conventional supernets, this method employs an architecture-based routing mechanism, enabling indirect sharing of model weights among subnetworks. This customization of weights for specific architectures, learned through gradient descent, minimizes retraining time, significantly enhancing training efficiency in NLP. The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models, exhibiting a superior latency-BLEU tradeoff compared to HAT, the SoTA NAS framework for machine translation. Furthermore, it excels in NAS for building memory-efficient task-agnostic BERT models, surpassing NAS-BERT and AutoDistil across various model sizes. The code can be found at: https://github.com/UBC-NLP/MoS.

2023

pdf bib abs

Aspect Sentiment Triplet Extraction (ASTE) is an important task in sentiment analysis, aiming to extract aspect-level opinions and sentiments from user-generated reviews. The fine-grained nature of ASTE incurs a high annotation cost, while the scarcity of annotated data limits the performance of existing methods. This paper exploits data augmentation to address this issue. Traditional augmentation methods typically modify the input sentences of existing samples via heuristic rules or language models, which have shown success in text classification tasks. However, applying these methods to fine-grained tasks like ASTE poses challenges in generating diverse augmented samples while maintaining alignment between modified sentences and origin labels. Therefore, this paper proposes a target-to-source augmentation approach for ASTE. Our approach focuses on learning a generator that can directly generate new sentences based on labels and syntactic templates. With this generator, we can generate a substantial number of diverse augmented samples by mixing labels and syntactic templates from different samples. Besides, to ensure the quality of the generated sentence, we introduce fluency and alignment discriminators to provide feedback on the generated sentence and then use this feedback to optimize the generator via a reinforcement learning framework. Experiments demonstrate that our approach significantly enhances the performance of existing ASTE models.

2018

pdf bib abs

This paper presents a UIR-Miner system for emotion and sentiment analysis evaluation in Twitter in SemEval 2018. Our system consists of three main modules: preprocessing module, stacking module to solve the intensity prediction of emotion and sentiment, LSTM network module to solve multi-label classification, and the hierarchical attention network module for solving emotion and sentiment classification problem. According to the metrics of SemEval 2018, our system gets the final scores of 0.636, 0.531, 0.731, 0.708, and 0.408 on 5 subtasks, respectively.