Abstract
Aspect extraction is not a well-explored topic in Hindi, with only one corpus having been developed for the task. In this paper, we discuss the merits of the existing corpus in terms of quality, size, sparsity, and performance in aspect extraction tasks using established models. To provide a better baseline corpus for aspect extraction, we translate the SemEval 2014 aspect-based sentiment analysis dataset and annotate the aspects in that data. We provide rigorous guidelines and a replicable methodology for this task. We quantitatively evaluate the translations and annotations using inter-annotator agreement scores. We also evaluate our dataset using state-of-the-art neural aspect extraction models in both monolingual and multilingual settings and show that the models perform far better on our corpus than on the existing Hindi dataset. With this, we establish our corpus as the gold-standard aspect extraction dataset in Hindi.- Anthology ID:
- 2021.ecnlp-1.17
- Volume:
- Proceedings of the 4th Workshop on e-Commerce and NLP
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venue:
- ECNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 140–149
- Language:
- URL:
- https://aclanthology.org/2021.ecnlp-1.17
- DOI:
- 10.18653/v1/2021.ecnlp-1.17
- Cite (ACL):
- Arghya Bhattacharya, Alok Debnath, and Manish Shrivastava. 2021. Enhancing Aspect Extraction for Hindi. In Proceedings of the 4th Workshop on e-Commerce and NLP, pages 140–149, Online. Association for Computational Linguistics.
- Cite (Informal):
- Enhancing Aspect Extraction for Hindi (Bhattacharya et al., ECNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.ecnlp-1.17.pdf