Jiajun Liu

Other people with similar names: Jiajun Liu

Unverified author pages with similar names: Jiajun Liu

2026

Mixture of Experts (MoE) dynamically routes inputs to specialized expert networks, enabling large language models to scale capacity with low inference overhead. To further improve MoE’s parameter efficiency in resource-constrained scenarios, LoRA–MoE integrates LoRA for lightweight adaptation while preserving MoE’s specialization. Despite these benefits, the effectiveness of LoRA–MoE still hinges on balanced expert utilization, where certain experts dominate activations while most remain underutilized. Existing balancing strategies focus on constraining the final distribution of expert usage, but overlook the routing decisions made at each layer. As a result, imbalances gradually accumulate across the routing hierarchy. To address this challenge, we propose LayerMoE, a novel three-stage framework that leverages process-level rewards to guide balanced expert routing. Specifically, to overcome the limitation of focusing only on final losses and ignoring intermediate routing, we introduce Monte Carlo Tree Search (MCTS)-based sampling that decomposes outcome-level supervision into layer-wise reward signals, guiding expert choices throughout the routing process. For efficiency, we organize Transformer layers into groups, which constrain the search space of MCTS and keep exploration overhead tractable while retaining the hierarchical structure. Extensive experiments on representative datasets (e.g., ARC, RACE, OBQA) show that applying LayerMoE consistently improves the performance of state-of-the-art LoRA-MoE baselines, yielding an average accuracy gain of 1.39%. Notably, the maximum improvement reaches 2.50%.

pdf bib abs

Chain-of-thought (CoT) reasoning has emerged as a crucial paradigm for enhancing large language model (LLM) performance on multi-step reasoning tasks.However, the internal mechanisms by which LLMs invoke knowledge and propagate information across different steps of the CoT are poorly understood.To fill this gap, we propose a multi-stage probing framework that enforces structured reasoning with three explicit stages: keyword extraction, theorem generation, and computation execution.The framework integrates attention knockout to trace cross-layer information flow and theorem probing to examine how specific contents are encoded within representations.To enable controlled and stage-aligned analysis, we construct a structured CoT dataset that covers the mathematics and physics domains. Experiments on four instruction-tuned LLMs reveal distinct stage-specific patterns.First, keyword information is progressively aggregated into the final token in later layers.Second, theorem semantics are encoded in the mid-to-late layers and undergo two stages of propagation.Finally, parameter substitution is achieved through joint extraction by the final token and other tokens.The first parameter predominantly relies on the final token, whereas later parameters increasingly depend on information extracted by other tokens.Overall, our findings shed light on the neural implementation of CoT reasoning and provide actionable insights for developing more interpretable and reasoning-capable LLMs.We further evaluate a free-form prompting setting without labeled fields and observe consistent qualitative trends.

pdf bib abs

Relation extraction (RE) identifies semantic relations between entities in text, with existing methods falling into two main paradigms: discriminative and generative. Discriminative models encode sentences and entities into relation representations and classify the most likely relation, whereas generative models directly produce relation labels through sequence generation. Although the latter have benefited from recent advances in large language models (LLMs), their performance remains limited by bottlenecks. In this work, we present the systematic investigation of how discriminative models can support generative RE. We propose the Discriminative-to-Generative (D2G) framework, which first leverages discriminative models to produce a top-k set of candidate relations, and then integrates this knowledge into generative models via in-context or prompt learning. Extensive experiments on five widely used RE benchmarks demonstrate that D2G consistently achieves state-of-the-art performance, with notable gains on long-tailed relation classes.

pdf bib abs

Capability Decomposition for Unified Information Extraction via Hierarchical Mixture-of-Experts
Jing Zhou | Peng Wang | Wenjun Ke | Jiajun Liu | Yao He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Unified Information Extraction (UIE) aims to handle heterogeneous IE tasks within a single framework, but existing methods often suffer from inconsistent schema representation, implicitly intermediate reasoning and full-parameter adaptation, which limit generalization, interpretability and parameter efficiency. To address these issues, we propose UC-UIE (Universal Capabilities-based Unified Information Extractor), a unified framework based on Large Language Model (LLM), which introduces a unified frame-and-slots schema for IE tasks and explicitly decomposes IE reasoning into three universal capabilities: judging, locating, and associating. Furthermore, UC-UIE adopts a Low-Rank Adaptation (LoRA) based hierarchical Mixture-of-Experts (MoE) adapter to fine-tune LLMs for IE tasks, which explicitly models these three capabilities in a task-driven way while ensuring parameter efficiency. With only 1.24% trainable parameters, UC-UIE outperforms full-parameter tuning methods, showing excellent parameter efficiency. Zero-shot evaluation reveals its strong generalization ability to unseen domains and schemas, benefiting from unified schema representation and explicit capability decomposition. Further experiments validate that the hierarchical MoE adapter learns capability specialization and composition, which enhances both UIE performance and interpretability.

2025

pdf bib abs

Topic modeling aims to discover the distribution of topics within a corpus. The advanced comprehension and generative capabilities of large language models (LLMs) have introduced new avenues for topic modeling, particularly by prompting LLMs to generate topics and refine them by merging similar ones. However, this approach necessitates that LLMs generate topics with consistent granularity, thus relying on the exceptional instruction-following capabilities of closed-source LLMs (such as GPT-4) or requiring additional training. Moreover, merging based only on topic words and neglecting the fine-grained semantics within documents might fail to fully uncover the underlying topic structure. In this work, we propose a semi-supervised topic modeling method, LiSA, that combines LLMs with clustering to improve topic generation and distribution. Specifically, we begin with prompting LLMs to generate a candidate topic word for each document, thereby constructing a topic-level semantic space. To further utilize the mutual complementarity between them, we first cluster documents and candidate topic words, and then establish a mapping from document to topic in the LLM-guided assignment stage. Subsequently, we introduce a collaborative enhancement strategy to align the two semantic spaces and establish a better topic distribution. Experimental results demonstrate that LiSA outperforms state-of-the-art methods that utilize GPT-4 on topic alignment, and exhibits competitive performance compared to Neural Topic Models on topic quality. The codes are available at https://github.com/ljh986/LiSA.

pdf bib abs

Recent advancements in large language models (LLMs) have demonstrated their impressive generative capabilities, primarily due to their extensive parameterization, which enables them to encode vast knowledge. However, effectively integrating new knowledge into LLMs remains a major challenge. Current research typically first constructs novel knowledge datasets and then injects this knowledge into LLMs through various techniques. However, existing methods for constructing new datasets either rely on timestamps, which lack rigor, or use simple templates for synthesis, which are simplistic and do not accurately reflect the real world. To address this issue, we propose a novel knowledge dataset construction approach that simulates biological evolution using knowledge graphs to generate synthetic entities with diverse attributes, resulting in a dataset, NovelHuman. Systematic analysis on NovelHuman reveals that the intra-sentence position of knowledge significantly affects the acquisition of knowledge. Therefore, we introduce an intra-sentence permutation to enhance knowledge acquisition. Furthermore, given that potential conflicts exist between autoregressive (AR) training objectives and permutation-based learning, we propose PermAR, a permutation-based language modeling framework for AR models. PermAR seamlessly integrates with mainstream AR architectures, endowing them with bidirectional knowledge acquisition capabilities. Extensive experiments demonstrate the superiority of PermAR, outperforming knowledge augmentation methods by 3.3%-38%.

pdf bib abs

Commonsense, humans’ implicit understanding of everyday situations, is crucial for large language models (LLMs). Existing commonsense evaluations for LLMs primarily focus on downstream knowledge tasks, failing to probe whether LLMs truly understand and utilize knowledge or merely memorize it. They also rely heavily on human annotation and lack automated large-scale data generation. To address this, we propose to automatically construct a large benchmark named CoCo (Consistency of Commonsense) comprising 39K samples derived from commonsense knowledge graphs (CSKGs), paired with symbolic questions and ground-truth answers, which systematically assesses LLMs’ knowledge memorization, comprehension, and application and examines the consistency between these tasks. To enhance our evaluation, we also propose novel metrics and prompting strategies. Experimental results on multiple LLMs reveal that CoCo presents significant challenges, and our detailed analysis provides deeper insights into the strengths and limitations of LLMs’ commonsense abilities.

2024

pdf bib abs

Named entity recognition (NER) is a pivotal task reliant on textual data, often impeding the disambiguation of entities due to the absence of context. To tackle this challenge, conventional methods often incorporate images crawled from the internet as auxiliary information. However, the images often lack sufficient entities or would introduce noise. Even with high-quality images, it is still challenging to efficiently use images as auxiliaries (i.e., fine-grained alignment with texts). We introduce a novel method named InstructNER to address these issues. Leveraging the rich real-world knowledge and image synthesis capabilities of a large pre-trained stable diffusion (SD) model, InstructNER transforms the text-only NER into a multimodal NER (MNER) task. A selection process automatically identifies the best synthetic image by comparing fine-grained similarities with internet-crawled images through a visual bag-of-words strategy. Note, during the image synthesis, a cross-attention matrix between synthetic images and raw text emerges, which inspires a soft attention guidance alignment (AGA) mechanism. AGA optimizes the MNER task and concurrently facilitates instructive alignment in MNER. Empirical experiments on prominent MNER datasets show that our method surpasses all text-only baselines, improving F1-score by 1.4% to 2.3%. Remarkably, even when compared to fully multimodal baselines, our approach maintains competitive. Furthermore, we open-source a comprehensive synthetic image dataset and the code to supplement existing raw dataset. The code and datasets are available in https://github.com/Heyest/InstructNER.