How to Talk to Language Models: Serialization Strategies for Structured Entity Matching
Haoteng Yin, Jinha Kim, Prashant Mathur, Krishanu Sarker, Vidit Bansal
Abstract
Entity matching (EM), which identifies whether two data records refer to the same real-world entity, is crucial for knowledge base construction and enhancing data-driven AI systems. Recent advances in language models (LMs) have shown great potential in resolving entities with rich textual attributes. However, their performance heavily depends on how structured entities are “talked” through serialized text. The impact of this serialization process remains underexplored, particularly for entities with complex relations in knowledge graphs (KGs). In this work, we systematically study entity serialization by benchmarking the effect of common schemes with LMs of different sizes on diverse tabular matching datasets. We apply our findings to propose a novel serialization scheme for KG entities based on random walks and utilize LLMs to encode sampled semantic walks for matching. Using this lightweight approach with open-source LLMs, we achieve a leading performance on EM in canonical and highly heterogeneous KGs, demonstrating significant throughput increases and superior robustness compared to GPT-4-based methods. Our study on serialization provides valuable insights for the deployment of LMs in real-world EM tasks.- Anthology ID:
- 2025.findings-naacl.437
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2025
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Luis Chiruzzo, Alan Ritter, Lu Wang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7836–7850
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.437/
- DOI:
- Cite (ACL):
- Haoteng Yin, Jinha Kim, Prashant Mathur, Krishanu Sarker, and Vidit Bansal. 2025. How to Talk to Language Models: Serialization Strategies for Structured Entity Matching. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 7836–7850, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- How to Talk to Language Models: Serialization Strategies for Structured Entity Matching (Yin et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.437.pdf