Ehsan Moradi Pari
Also published as: Ehsan Moradi Pari
2026
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction
Xu Shen | Qi Zhang | Song Wang | Zhen Tan | Xinyu Zhao | Laura Yao | Vaishnav Tadiparthi | Hossein Nourkhiz Mahjoub | Ehsan Moradi Pari | Kwonjoon Lee | Tianlong Chen
Findings of the Association for Computational Linguistics: ACL 2026
Xu Shen | Qi Zhang | Song Wang | Zhen Tan | Xinyu Zhao | Laura Yao | Vaishnav Tadiparthi | Hossein Nourkhiz Mahjoub | Ehsan Moradi Pari | Kwonjoon Lee | Tianlong Chen
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Model based multi-agent systems (MAS) excel at collaborative problem solving but remain brittle to cascading errors: a single faulty step can propagate across agents and disrupt the trajectory. In this paper, we present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction. MASC rethinks detection as history-conditioned anomaly scoring via two complementary designs: (1) Next-Execution Reconstruction, which predicts the embedding of the next step from the query and interaction history to capture causal consistency, and (2) Prototype-Guided Enhancement, which learns a prototype prior over normal-step embeddings and uses it to stabilize reconstruction and anomaly scoring under sparse context (e.g., early steps). When an anomaly step is flagged, MASC triggers a correction agent to revise the acting agent’s output before information flows downstream. On the Who When benchmark, MASC consistently outperforms all baselines, achieving up to 7.8% AUC-ROC improvement in the challenging w/o GT setting, and further delivers consistent gains on AgentErrorBench. When plugged into diverse MAS frameworks, it delivers consistent end-to-end gains across architectures, confirming that our metacognitive monitoring and targeted correction can mitigate error propagation with minimal overhead.
2024
Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
Muhan Lin | Shuyang Shi | Yue Guo | Behdad Chalaki | Vaishnav Tadiparthi | Ehsan Moradi Pari | Simon Stepputtis | Joseph Campbell | Katia P. Sycara
Findings of the Association for Computational Linguistics: EMNLP 2024
Muhan Lin | Shuyang Shi | Yue Guo | Behdad Chalaki | Vaishnav Tadiparthi | Ehsan Moradi Pari | Simon Stepputtis | Joseph Campbell | Katia P. Sycara
Findings of the Association for Computational Linguistics: EMNLP 2024
The correct specification of reward models is a well-known challenge in reinforcement learning.Hand-crafted reward functions often lead to inefficient or suboptimal policies and may not be aligned with user values.Reinforcement learning from human feedback is a successful technique that can mitigate such issues, however, the collection of human feedback can be laborious.Recent works have solicited feedback from pre-trained large language models rather than humans to reduce or eliminate human effort, however, these approaches yield poor performance in the presence of hallucination and other errors.This paper studies the advantages and limitations of reinforcement learning from large language model feedback and proposes a simple yet effective method for soliciting and applying feedback as a potential-based shaping function.We theoretically show that inconsistent rankings – which approximate ranking errors – lead to uninformative rewards with our approach. Our method empirically improves convergence speed and policy returns over commonly used baselines even with significant ranking errors, and eliminates the need for complex post-processing of reward functions.