Jiangtong Li

2026

Do LLM Agents Really Mimic Humans? Diagnosing and Aligning Microeconomic Behaviors in Macro-ABMs
Guangya Liu | Cheng Wang | Jiangtong Li | Huafei Wu | Changjun Jiang
Findings of the Association for Computational Linguistics: ACL 2026

Large Language Models (LLMs) are increasingly adopted in macroeconomic agent-based modeling(ABM). However, existing research focuses on replicating macro-level stylized facts while often neglecting verification of micro-level decision-making. We investigate this gap by comparing LLM agents to human responses from the Survey of Consumer Expectations (SCE) dataset. Our empirical analysis identifies specific limitations: weak trend responsiveness, mode collapse, and a potential data leakage. We propose the Heterogeneous Shock-Response Causal Transmission Framework to tackle these issues. To ensure theoretical consistency, we use LLMs to build a literature-verified causal graph in which macroeconomic shocks influence decisions via generated mediator nodes, while agent profiles serve as edge moderators. Building on this, during inference, we perform a path search to retrieve relevant causal chains and inject them as an explicit Chain-of-Thought(CoT), prioritizing mechanistic logic over statistical pattern matching. To evaluate the effectiveness of our inference approach, we validate it via a two-stage process that combines micro-level dataset testing and macro-level simulation in the EconAgent system. Results from these experiments indicate that our framework improves alignment with human trends and effectively captures behavioral heterogeneity. Overall, this work contributes to the development of reliable and grounded economic simulations.

pdf bib abs

Can LLMs Really Judge? A Progressive Argumentation-Mining Framework for Distinguishing Understanding from Aggregation
Fuyu Wang | Jiangtong Li | Kun Zhu | Changjun Jiang
Findings of the Association for Computational Linguistics: ACL 2026

Current evaluations of large language models (LLMs) mainly rely on dataset-based generation accuracy. However, generative correctness does not guarantee the discriminative capability required to verify solutions, frequently masking an inability to distinguish valid reasoning from plausible errors. While multi-agent debate inherently entails judgment, we show that uncontrolled context growth and convergence to majority voting introduce significant noise, obscuring intrinsic model judgment. To address these limitations, we propose a progressive argumentation-mining diagnostic framework designed to explicitly control context and isolate discriminative behaviors. Instead of indiscriminate aggregation, our approach distills and retains only the single most well-supported rationale per answer, preventing context dilution while enforcing strict quality-based selection. Applying this framework reveals a fundamental cognitive divergence: models exhibit structural susceptibility to plausible misinformation in knowledge tasks, whereas in reasoning tasks they demonstrate latent discriminative potential that remains fragile under pressure. These findings underscore the fragility of discriminative capabilities, advocating for diagnostic methodologies that prioritize judgment stability over simple generation performance.

2025

pdf bib abs

InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
Fuyu Wang | Jiangtong Li | Kun Zhu | Changjun Jiang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the rapid advancements in large language models (LLMs), debating tasks, such as argument quality assessment and debate process simulation, have made significant progress. However, existing LLM-based debating systems focus on responding to specific arguments while neglecting objective assessments such as authenticity and logical validity. Furthermore, these systems lack a structured approach to optimize across various dimensions—including evaluation metrics, chain-of-thought (CoT) reasoning, and multi-turn debate refinement—thereby limiting their effectiveness. To address these interconnected challenges, we propose a dual-component framework: (1) InspireScore, a novel evaluation system that establishes a multi-dimensional assessment architecture incorporating four subjective criteria (emotional appeal, argument clarity, argument arrangement, and topic relevance) alongside two objective metrics (fact authenticity and logical validity); and (2) InspireDebate, an optimized debating framework employing a phased optimization approach through CoT reasoning enhancement, multi-dimensional Direct Preference Optimization (DPO), and real-time knowledge grounding via web-based Retrieval Augmented Generation (Web-RAG). Empirical evaluations demonstrate that InspireScore achieves 44% higher correlation with expert judgments compared to existing methods, while InspireDebate shows significant improvements, outperforming baseline models by 57%. Source code is available at https://github.com/fywang12/InspireDebate.

2019

pdf bib abs

Lattice-Based Transformer Encoder for Neural Machine Translation
Fengshun Xiao | Jiangtong Li | Hai Zhao | Rui Wang | Kehai Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural machine translation (NMT) takes deterministic sequences for source representations. However, either word-level or subword-level segmentations have multiple choices to split a source sequence with different word segmentors or different subword vocabulary sizes. We hypothesize that the diversity in segmentations may affect the NMT performance. To integrate different segmentations with the state-of-the-art NMT model, Transformer, we propose lattice-based encoders to explore effective word or subword representation in an automatic way during training. We propose two methods: 1) lattice positional encoding and 2) lattice-aware self-attention. These two methods can be used together and show complementary to each other to further improve translation performance. Experiment results show superiorities of lattice-based encoders in word-level and subword-level representations over conventional Transformer encoder.

2018

pdf bib abs

SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings
Zhuosheng Zhang | Jiangtong Li | Hai Zhao | Bingjie Tang
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.

pdf bib abs

Lingke: a Fine-grained Multi-turn Chatbot for Customer Service
Pengfei Zhu | Zhuosheng Zhang | Jiangtong Li | Yafang Huang | Hai Zhao
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

pdf bib abs

Modeling Multi-turn Conversation with Deep Utterance Aggregation
Zhuosheng Zhang | Jiangtong Li | Pengfei Zhu | Hai Zhao | Gongshen Liu
Proceedings of the 27th International Conference on Computational Linguistics

Multi-turn conversation understanding is a major challenge for building intelligent dialogue systems. This work focuses on retrieval-based response matching for multi-turn conversation whose related work simply concatenates the conversation utterances, ignoring the interactions among previous utterances for context modeling. In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation. In detail, a self-matching attention is first introduced to route the vital information in each utterance. Then the model matches a response with each refined utterance and the final matching score is obtained after attentive turns aggregation. Experimental results show our model outperforms the state-of-the-art methods on three multi-turn conversation benchmarks, including a newly introduced e-commerce dialogue corpus.

Co-authors

Kun Zhu 1

Venues

Fix author