Abstract
In this paper we present two Machine Learning algorithms namely Stochastic Gradient Descent and Multi Layer Perceptron to Identify the technical domain of given text as such text provides information about the specific domain. We performed our experiments on Coarse-grained technical domains like Computer Science, Physics, Law, etc for English, Bengali, Gujarati, Hindi, Malayalam, Marathi, Tamil, and Telugu languages, and on fine-grained sub domains for Computer Science like Operating System, Computer Network, Database etc for only English language. Using TFIDF as a feature extraction method we show how both the machine learning models perform on the mentioned languages.- Anthology ID:
- 2020.icon-techdofication.6
- Volume:
- Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task
- Month:
- December
- Year:
- 2020
- Address:
- Patna, India
- Editors:
- Dipti Misra Sharma, Asif Ekbal, Karunesh Arora, Sudip Kumar Naskar, Dipankar Ganguly, Sobha L, Radhika Mamidi, Sunita Arora, Pruthwik Mishra, Vandan Mujadia
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 27–30
- Language:
- URL:
- https://aclanthology.org/2020.icon-techdofication.6
- DOI:
- Cite (ACL):
- Hema Ala and Dipti Sharma. 2020. Automatic Technical Domain Identification. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): TechDOfication 2020 Shared Task, pages 27–30, Patna, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Automatic Technical Domain Identification (Ala & Sharma, ICON 2020)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2020.icon-techdofication.6.pdf