TechTexC: Classification of Technical Texts using Convolution and Bidirectional Long Short Term Memory Network

Omar Sharif, Eftekhar Hossain, Mohammed Moshiul Hoque

[How to correct problems with metadata yourself]


Abstract
This paper illustrates the details description of technical text classification system and its results that developed as a part of participation in the shared task TechDofication 2020. The shared task consists of two sub-tasks: (i) first task identify the coarse-grained technical domain of given text in a specified language and (ii) the second task classify a text of computer science domain into fine-grained sub-domains. A classification system (called ‘TechTexC’) is developed to perform the classification task using three techniques: convolution neural network (CNN), bidirectional long short term memory (BiLSTM) network, and combined CNN with BiLSTM. Results show that CNN with BiLSTM model outperforms the other techniques concerning task-1 of sub-tasks (a, b, c and g) and task-2a. This combined model obtained f1 scores of 82.63 (sub-task a), 81.95 (sub-task b), 82.39 (sub-task c), 84.37 (sub-task g), and 67.44 (task-2a) on the development dataset. Moreover, in the case of test set, the combined CNN with BiLSTM approach achieved that higher accuracy for the subtasks 1a (70.76%), 1b (79.97%), 1c (65.45%), 1g (49.23%) and 2a (70.14%).
Anthology ID:
2020.icon-techdofication.8
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task
Month:
December
Year:
2020
Address:
Patna, India
Editors:
Dipti Misra Sharma, Asif Ekbal, Karunesh Arora, Sudip Kumar Naskar, Dipankar Ganguly, Sobha L, Radhika Mamidi, Sunita Arora, Pruthwik Mishra, Vandan Mujadia
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
35–39
Language:
URL:
https://aclanthology.org/2020.icon-techdofication.8
DOI:
Bibkey:
Cite (ACL):
Omar Sharif, Eftekhar Hossain, and Mohammed Moshiul Hoque. 2020. TechTexC: Classification of Technical Texts using Convolution and Bidirectional Long Short Term Memory Network. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task, pages 35–39, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
TechTexC: Classification of Technical Texts using Convolution and Bidirectional Long Short Term Memory Network (Sharif et al., ICON 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/2020.icon-techdofication.8.pdf