DiscoverGPT: Multi-task Fine-tuning Large Language Model for Related Table Discovery

Xuming Hu; Xiao Qin; Chuan Lei; Asterios Katsifodimos; Zhengyuan Shen; Balasubramaniam Srinivasan; Huzefa Rangwala

DiscoverGPT: Multi-task Fine-tuning Large Language Model for Related Table Discovery

Xuming Hu, Xiao Qin, Chuan Lei, Asterios Katsifodimos, Zhengyuan Shen, Balasubramaniam Srinivasan, Huzefa Rangwala

Abstract

Natural language understanding over tabular data has played a significant role in data discovery tasks such as joinable and unionable table search. State-of-the-art approaches adopt large language models (LLMs) pre-trained over massive text corpora to learn and evaluate the table semantic relatedness. Existing methods typically follow a pretrain-and-finetune paradigm, namely fine-tuning an LLM using tabular data with table relatedness labels. To enhance model’s understanding of tabular data, recent studies include auxiliary tasks such as entity resolution and column type classification in the fine-tuning phase. In spite of achieving performance gain from these supervisions, there is a lack of study on how these supervisions complement or even contrast each other, leading to a subpar performance on the final data discovery tasks. In this paper, we propose a simple yet effective multi-task fine-tuning framework named DiscoverGPT that holistically discovers and leverages the intricate relationships among the supervisions to optimize the performance on the data discovery task. Moreover, DiscoverGPT is plug-and-play that allows a broad range of open-domain auxiliary tasks to be incorporated, by utilizing the generative power of LLMs. We demonstrate the usability and effectiveness of DiscoverGPT with baseline comparisons and ablation studies. DiscoverGPT outperforms the best performing baseline by up to 7% in F1 score.

Anthology ID:: 2025.findings-naacl.21
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 358–373
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.21/
DOI:
Bibkey:
Cite (ACL):: Xuming Hu, Xiao Qin, Chuan Lei, Asterios Katsifodimos, Zhengyuan Shen, Balasubramaniam Srinivasan, and Huzefa Rangwala. 2025. DiscoverGPT: Multi-task Fine-tuning Large Language Model for Related Table Discovery. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 358–373, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: DiscoverGPT: Multi-task Fine-tuning Large Language Model for Related Table Discovery (Hu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.21.pdf

PDF Cite Search Fix data