Shijie Guo
2026
Simple-VGC: Enhancing Visual Grounding in Multimodal Reasoning via Adaptive Tool Composition
Ye Wang | Qianglong Chen | Siyuan Wang | Zejun Li | Shijie Guo | Zhirui Zhang | Zhongyu Wei
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ye Wang | Qianglong Chen | Siyuan Wang | Zejun Li | Shijie Guo | Zhirui Zhang | Zhongyu Wei
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal Large Language Models (MLLMs) have achieved strong performance on vision-language tasks, yet often fail to preserve and effectively leverage visual evidence throughout generation. We identify three fundamental types of visual grounding failures: Long-Context Grounding Error, where visual information gradually decays over long sequences; Fine-Grained Grounding Error, where low-resolution or degraded inputs hinder the recovery of detailed visual information; and Regional Grounding Error, where spatially diffuse attention weakens region-level vision-language alignment. To address these issues, we propose a tool-augmented reasoning framework with three targeted compensation strategies: reuse, which re-injects the original image to mitigate visual forgetting; focus_area, which constrains attention to task-relevant regions; and zoom_in, which enhances visual resolution for fine-grained perception. We further construct the TWI-Tools-146K dataset and develop Simple-VGC, a tool-augmented MLLM that interleaves visual and textual tokens. Extensive experiments show that each tool yields targeted improvements for its corresponding grounding error, while their combination produces synergistic gains in visual reasoning. Beyond performance, our analysis provides mechanistic insights into how tool-based interventions improve visual grounding, pointing toward more reliable multimodal reasoning.
2025
Enhancing Nursing and Elderly Care with Large Language Models: An AI-Driven Framework
Qiao Sun | Jiexin Xie | Nanyang Ye | Qinying Gu | Shijie Guo
Proceedings of the 31st International Conference on Computational Linguistics
Qiao Sun | Jiexin Xie | Nanyang Ye | Qinying Gu | Shijie Guo
Proceedings of the 31st International Conference on Computational Linguistics
This paper explores the application of large language models (LLMs) in nursing and elderly care, focusing on AI-driven patient monitoring and interaction. We introduce a novel Chinese nursing dataset and implement incremental pre-training (IPT) and supervised fine-tuning (SFT) techniques to enhance LLM performance in specialized tasks. Using LangChain, we develop an interactable nursing assistant capable of real-time care and personalized interventions. Experimental results demonstrate significant improvements, paving the way for AI-driven solutions to meet the growing demands of healthcare in aging populations.