How to Talk to Language Models: Serialization Strategies for Structured Entity Matching

Haoteng Yin, Jinha Kim, Prashant Mathur, Krishanu Sarker, Vidit Bansal


Abstract
Entity matching (EM), which identifies whether two data records refer to the same real-world entity, is crucial for knowledge base construction and enhancing data-driven AI systems. Recent advances in language models (LMs) have shown great potential in resolving entities with rich textual attributes. However, their performance heavily depends on how structured entities are “talked” through serialized text. The impact of this serialization process remains underexplored, particularly for entities with complex relations in knowledge graphs (KGs). In this work, we systematically study entity serialization by benchmarking the effect of common schemes with LMs of different sizes on diverse tabular matching datasets. We apply our findings to propose a novel serialization scheme for KG entities based on random walks and utilize LLMs to encode sampled semantic walks for matching. Using this lightweight approach with open-source LLMs, we achieve a leading performance on EM in canonical and highly heterogeneous KGs, demonstrating significant throughput increases and superior robustness compared to GPT-4-based methods. Our study on serialization provides valuable insights for the deployment of LMs in real-world EM tasks.
Anthology ID:
2025.findings-naacl.437
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7836–7850
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.437/
DOI:
Bibkey:
Cite (ACL):
Haoteng Yin, Jinha Kim, Prashant Mathur, Krishanu Sarker, and Vidit Bansal. 2025. How to Talk to Language Models: Serialization Strategies for Structured Entity Matching. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 7836–7850, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
How to Talk to Language Models: Serialization Strategies for Structured Entity Matching (Yin et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.437.pdf