Qing Shi


2025

pdf bib
GraDaSE: Graph-Based Dataset Search with Examples
Jing He | Mingyang Lv | Qing Shi | Gong Cheng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Dataset search is a specialized information retrieval task. In the emerging scenario of Dataset Search with Examples (DSE), the user submits a query and a few target datasets that are known to be relevant as examples. The retrieved datasets are expected to be relevant to the query and also similar to the target datasets. Distinguished from existing text-based retrievers, we propose a graph-based approach GraDaSE. Besides the textual metadata of the datasets, we identify their provenance-based and topic-based relationships to construct a graph, and jointly encode their structural and textual information for ranking candidate datasets. GraDaSE outperforms a variety of strong baselines on two test collections, including DataFinder-E that we construct.