Xuanwen Ding

2026

AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding | Chengjun Pan | Zejun Li | Jiwen Zhang | Siyuan Wang | Zhongyu Wei
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Evaluating multimodal large language models (MLLMs) is becoming increasingly expensive as benchmarks grow in scale and cross-modality complexity. Inspired by structuralism in cognitive psychology, we tackle this difficulty with an adaptive evaluation framework for efficient benchmarking, namely **AutoJudger**. Instead of passively scoring on a fixed test set, AutoJudger treats evaluation as an interview-like process by keeping a hypothesized ability structure of the evaluated model and actively selecting the informative questions so as to refine these ability boundaries. Specifically, AutoJudger has three core components: **ability decomposition** to organize evaluation along meaningful capability dimensions, **ability estimation** to maintain an up-to-date quantitative profile of the model competence, and **adaptive question selection** to choose the most informative questions. To operationalize this paradigm, we introduce **A²-Judger**, a novel MLLM-based **A**gentic instantiation of **A**uto**Judger** equipped with semantic-aware retrieval and dynamic memory. Experiments on four representative multimodal benchmarks show that A²-Judger significantly improves sample efficiency while maintaining reliable evaluation results.

2024

pdf bib abs

Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis, which aims to extract the aspects and predict their sentiments. Most existing studies focus on improving the performance of the target domain by fine-tuning domain-specific models (trained on source domains) based on the target domain dataset. Few works propose continual learning tasks for ABSA, which aim to learn the target domain’s ability while maintaining the history domains’ abilities. In this paper, we propose a Large Language Model-based Continual Learning (LLM-CL) model for ABSA. First, we design a domain knowledge decoupling module to learn a domain-invariant adapter and separate domain-variant adapters dependently with an orthogonal constraint. Then, we introduce a domain knowledge warmup strategy to align the representation between domain-invariant and domain-variant knowledge. In the test phase, we index the corresponding domain-variant knowledge via domain positioning to not require each sample’s domain ID. Extensive experiments over 19 datasets indicate that our LLM-CL model obtains new state-of-the-art performance.

Co-authors

Venues

ACL1
Findings1

Fix author