Amalgamation of protein sequence, structure and textual information for improving protein-protein interaction identification

Pratik Dutta, Sriparna Saha


Abstract
An in-depth exploration of protein-protein interactions (PPI) is essential to understand the metabolism in addition to the regulations of biological entities like proteins, carbohydrates, and many more. Most of the recent PPI tasks in BioNLP domain have been carried out solely using textual data. In this paper, we argue that incorporating multimodal cues can improve the automatic identification of PPI. As a first step towards enabling the development of multimodal approaches for PPI identification, we have developed two multi-modal datasets which are extensions and multi-modal versions of two popular benchmark PPI corpora (BioInfer and HRPD50). Besides, existing textual modalities, two new modalities, 3D protein structure and underlying genomic sequence, are also added to each instance. Further, a novel deep multi-modal architecture is also implemented to efficiently predict the protein interactions from the developed datasets. A detailed experimental analysis reveals the superiority of the multi-modal approach in comparison to the strong baselines including unimodal approaches and state-of the-art methods over both the generated multi-modal datasets. The developed multi-modal datasets are available for use at https://github.com/sduttap16/MM_PPI_NLP.
Anthology ID:
2020.acl-main.570
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6396–6407
Language:
URL:
https://aclanthology.org/2020.acl-main.570
DOI:
10.18653/v1/2020.acl-main.570
Bibkey:
Cite (ACL):
Pratik Dutta and Sriparna Saha. 2020. Amalgamation of protein sequence, structure and textual information for improving protein-protein interaction identification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6396–6407, Online. Association for Computational Linguistics.
Cite (Informal):
Amalgamation of protein sequence, structure and textual information for improving protein-protein interaction identification (Dutta & Saha, ACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2020.acl-main.570.pdf
Video:
 http://slideslive.com/38929411