Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus

Sohrab Ferdowsi, Nikolay Borissov, Julien Knafou, Poorya Amini, Douglas Teodoro


Abstract
We consider the hierarchical representation of documents as graphs and use geometric deep learning to classify them into different categories. While graph neural networks can efficiently handle the variable structure of hierarchical documents using the permutation invariant message passing operations, we show that we can gain extra performance improvements using our proposed selective graph pooling operation that arises from the fact that some parts of the hierarchy are invariable across different documents. We applied our model to classify clinical trial (CT) protocols into completed and terminated categories. We use bag-of-words based, as well as pre-trained transformer-based embeddings to featurize the graph nodes, achieving f1-scoresaround 0.85 on a publicly available large scale CT registry of around 360K protocols. We further demonstrate how the selective pooling can add insights into the CT termination status prediction. We make the source code and dataset splits accessible.
Anthology ID:
2021.emnlp-main.48
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
608–618
Language:
URL:
https://aclanthology.org/2021.emnlp-main.48
DOI:
10.18653/v1/2021.emnlp-main.48
Bibkey:
Cite (ACL):
Sohrab Ferdowsi, Nikolay Borissov, Julien Knafou, Poorya Amini, and Douglas Teodoro. 2021. Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 608–618, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus (Ferdowsi et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2021.emnlp-main.48.pdf
Video:
 https://preview.aclanthology.org/naacl24-info/2021.emnlp-main.48.mp4
Code
 sssohrab/ct-classification-graphs