Neural-based Tamil Grammar Error Detection
Dineskumar Murugesapillai, Anankan Ravinthirarasa, Gihan Dias, Kengatharaiyer Sarveswaran
Abstract
This paper describes an ongoing development of a grammar error checker for the Tamil language using a state-of-the-art deep neural-based approach. This proposed checker capture a vital type of grammar error called subject-predicate agreement errors. In this case, we specifically target the agreement error that occurs between nominal subject and verbal predicates. We also created the first-ever grammar error annotated corpus for Tamil. In addition, we experimented with different multi-lingual pre-trained language models to capture syntactic information and found that IndicBERT gives better performance for our tasks. We implemented this grammar checker as a multi-class classification on top of the IndicBERT pre-trained model, which we fine-tuned using our annotated data. This baseline model gives an F1 Score of 73.4. We are now in the process of improving this proposed system with the use of a dependency parser.- Anthology ID:
- 2021.pail-1.4
- Volume:
- Proceedings of the First Workshop on Parsing and its Applications for Indian Languages
- Month:
- December
- Year:
- 2021
- Address:
- NIT Silchar, India
- Venue:
- PAIL
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 27–32
- Language:
- URL:
- https://aclanthology.org/2021.pail-1.4
- DOI:
- Cite (ACL):
- Dineskumar Murugesapillai, Anankan Ravinthirarasa, Gihan Dias, and Kengatharaiyer Sarveswaran. 2021. Neural-based Tamil Grammar Error Detection. In Proceedings of the First Workshop on Parsing and its Applications for Indian Languages, pages 27–32, NIT Silchar, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Neural-based Tamil Grammar Error Detection (Murugesapillai et al., PAIL 2021)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2021.pail-1.4.pdf