CITE: Benchmarking Heterogeneous Text-Attributed Graph Models

Chenghao Zhang; Qingqing Long; Ludi Wang; Wenjuan Cui; Jianjun Yu; Yi Du

CITE: Benchmarking Heterogeneous Text-Attributed Graph Models

Chenghao Zhang, Qingqing Long, Ludi Wang, Wenjuan Cui, Jianjun Yu, Yi Du

Abstract

Recent advances in large language models (LLMs) and text-aware graph learning have increased interest in reasoning over text-attributed graphs(TAGs). In many real-world settings, such graphs are inherently heterogeneous, with most existing benchmarks remaining largely homogeneous in structure. As a result, the lack of large-scale benchmarks for heterogeneous text-attributed graphs has hindered systematic evaluation and fair comparison of existing methods. In this work, we introduce CITE - **C**atalytic **I**nformation **T**extual **E**ntities Graph, the first and largest heterogeneous text-attributed citation graph benchmark for catalytic materials. CITE contains over 438K nodes and 1.2M edges spanning four node types and four relation types, with rich node-level textual information. We establish standardized evaluation protocols for node classification and link prediction, and conduct ablation studies to assess the impact of graph heterogeneity and textual attributes. Using CITE, we benchmark four classes of learning paradigms, including homogeneous graph models, heterogeneous graph models, LLM-centric models, and hybrid LLM–graph models. By providing a large-scale heterogeneous text-attributed benchmark together with standardized evaluation protocols and comprehensive baselines, CITE enables systematic assessment across diverse modeling paradigms and offers new insights into text-aware and LLM-enhanced graph learning. The dataset, codebase and evaluation suite are publicly available.

Anthology ID:: 2026.acl-long.1449
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 31426–31448
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1449/
DOI:
Bibkey:
Cite (ACL):: Chenghao Zhang, Qingqing Long, Ludi Wang, Wenjuan Cui, Jianjun Yu, and Yi Du. 2026. CITE: Benchmarking Heterogeneous Text-Attributed Graph Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 31426–31448, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: CITE: Benchmarking Heterogeneous Text-Attributed Graph Models (Zhang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1449.pdf
Checklist:: 2026.acl-long.1449.checklist.pdf

PDF Cite Search Checklist Fix data