Jiahui Jin
2025
Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning
Xinyu Zhang
|
Aibo Song
|
Jingyi Qiu
|
Jiahui Jin
|
Tianbo Zhang
|
Xiaolin Fang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Relation Extraction (RE) is a key task in table understanding, aiming to extract semantic relations between columns. However, complex tables with hierarchical headers are hard to obtain high-quality textual formats (e.g., Markdown) for input under practical scenarios like webpage screenshots and scanned documents, while table images are more accessible and intuitive. Besides, existing works overlook the need of mining relations among multiple columns rather than just the semantic relation between two specific columns in real-world practice. In this work, we explore utilizing Multimodal Large Language Models (MLLMs) to address RE in tables with complex structures. We creatively extend the concept of RE to include calculational relations, enabling multi-task learning of both semantic and calculational RE for mutual reinforcement. Specifically, we reconstruct table images into graph structure based on neighboring nodes to extract graph-level visual features. Such feature enhancement alleviates the insensitivity of MLLMs to the positional information within table images. We then propose a Chain-of-Thought distillation framework with self-correction mechanism to enhance MLLMs’ reasoning capabilities without increasing parameter scale. Our method significantly outperforms most baselines on wide datasets. Additionally, we release a benchmark dataset for calculational RE in complex tables.
GER-LLM: Efficient and Effective Geospatial Entity Resolution with Large Language Model
Haojia Zhu
|
Zhicheng Li
|
Jiahui Jin
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Geospatial Entity Resolution (GER) plays a central role in integrating spatial data from diverse sources. However, existing methods are limited by their reliance on large amounts of training data and their inability to incorporate commonsense knowledge. While recent advances in Large Language Models (LLMs) offer strong semantic reasoning and zero-shot capabilities, directly applying them to GER remains inadequate due to their limited spatial understanding and high inference cost. In this work, we present GER-LLM, a framework that integrates LLMs into the GER pipeline. To address the challenge of spatial understanding, we design a spatially informed blocking strategy based on adaptive quadtree partitioning and Area of Interest (AOI) detection, preserving both spatial proximity and functional relationships. To mitigate inference overhead, we introduce a group prompting mechanism with graph-based conflict resolution, enabling joint evaluation of diverse candidate pairs and enforcing global consistency across alignment decisions. Extensive experiments on real-world datasets demonstrate the effectiveness of our approach, yielding significant improvements over state-of-the-art methods.
Search
Fix author
Co-authors
- Xiaolin Fang 1
- Zhicheng Li 1
- Jingyi Qiu 1
- Aibo Song 1
- Xinyu Zhang 1
- show all...