DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics

YiQiu Guo, Yuchen Yang, Ya Zhang, Yu Wang, Yanfeng Wang


Abstract
Structured data offers an efficient means of organizing information. Exsisting text-serialization based methods for processing structured data using large language models (LLMs) are not designed to explicitly capture the heterogeneity of structured data. Such methods are suboptimal for LLMs to process structured data, and may lead to large input token size and poor robustness to input perturbation. In this paper, we propose a novel framework called DictLLM, which is an efficient and effective framework for the modeling of medical lab report to deal with the report-assisted diagnosis generation task. DictLLM introduce 1) group positional encoding to maintain the permutation invariance, 2) hierarchical attention bias to capture the inductive bias of structured data, and 3) a optimal transport alignment layer to align the embeddings generated by the dict encoder with the LLM, producing a list of fixed-length virtual tokens. We conduct experiments with multiple LLM models on a large-scale real-world medical lab report dataset for automatic diagnosis generation. The results show that our proposed framework outperforms the baseline methods and few-shot GPT-4 in terms of both Rouge-L and Knowledge F1 score. We also conduct multiple experiments and analyze the scalability and robustness of our proposed framework, demonstrating the superiority of our method in modeling the heterogeneous structure of medical dictionaries data.
Anthology ID:
2024.findings-acl.609
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10231–10241
Language:
URL:
https://aclanthology.org/2024.findings-acl.609
DOI:
10.18653/v1/2024.findings-acl.609
Bibkey:
Cite (ACL):
YiQiu Guo, Yuchen Yang, Ya Zhang, Yu Wang, and Yanfeng Wang. 2024. DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics. In Findings of the Association for Computational Linguistics: ACL 2024, pages 10231–10241, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics (Guo et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/autopr/2024.findings-acl.609.pdf