Surawat Pralomram
2026
Thesis Proposal: On the Granularity-Robustness Trade-off in Text-Derived Knowledge Graphs
Surawat Pralomram
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Surawat Pralomram
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Retrieval-augmented generation (RAG) based on dense embeddings has become a dominant paradigm for text retrieval. However, many real-world applications require attribute-specific querying, where explicit values or properties must be extracted from text (e.g., symptoms in clinical notes or dosage values in medical reports). Dense retrieval handles paraphrastic variation well but often entangles multiple attributes within a single embedding, making value extraction difficult. Knowledge graphs (KGs), in contrast, support explicit attribute access but are brittle under linguistic and structural variation, leading to low recall.This thesis proposal aims to investigate the representational trade-off underlying these approaches. We study knowledge graph representations from an information-theoretic and optimal coding perspective, focusing on the tension between fine-grained factorization and compact canonicalization of concepts. Building on this perspective, we propose a query-driven framework for constructing and retrieving knowledge graphs from text, aiming to combine the robustness of dense retrieval with the explicit queryability of symbolic representations.