Zhengda Zhou

2025

pdf bib abs
Enhancing Extractive Question Answering in Multiparty Dialogues with Logical Inference Memory Network
Shu Zhou | Rui Zhao | Zhengda Zhou | Haohan Yi | Xuhui Zheng | Hao Wang
Proceedings of the 31st International Conference on Computational Linguistics

Multiparty dialogue question answering (QA) in machine reading comprehension (MRC) is a challenging task due to its complex information flow interactions and logical QA inference. Existing models typically handle such QA tasks by decoupling dialogue information at both speaker and utterance levels. However, few of them consider the logical inference relations in multiparty dialogue QA, leading to suboptimal QA performance. To address this issue, this paper proposes a memory network with logical inference (LIMN) for extractive QA in multiparty dialogues. LIMN introduces an inference module, which is pretrained by incorporating plain QA articles as external knowledge. It generates logical inference-aware representations from latent space for multiparty dialogues. To further model complex interactions among logical dialogue contexts, questions and key-utterance information, a key-utterance-based interaction method is proposed for leverage. Moreover, a multitask learning strategy is adopted for robust MRC. Extensive experiments were conducted on Molweni and FriendsQA benchmarks, which included 25k and 10k questions, respectively. Comparative results showed that LIMN achieves state-of-the-art results on both benchmarks, demonstrating the enhancement of logical QA inference in multiparty dialogue QA tasks.

With the rapid advancement of Generative AI technology, Multimodal Large Language Models(MLLMs) have the potential to act as AI software engineers capable of executing complex web application development. Considering that the model requires a confluence of multidimensional sub-capabilities to address the challenges of various development phases, constructing a multi-view evaluation framework is crucial for accurately guiding the enhancement of development efficiency. However, existing benchmarks usually fail to provide an assessment of sub-capabilities and focus solely on webpage generation outcomes. In this work, we draw inspiration from the principles of software engineering and further propose WebUIBench, a benchmark systematically designed to evaluate MLLMs in four key areas: WebUI Perception, HTML Programming, WebUI-HTML Understanding, and WebUI-to-Code. WebUIBench comprises 21K high-quality question-answer pairs derived from over 0.7K real-world websites. The extensive evaluation of 29 mainstream MLLMs uncovers the skill characteristics and various weakness that models encountered during the development process.

Co-authors

Venues

coling1
findings1

Fix author