Abstract
The BLP-2023 Task 1 aims to develop a Natural Language Inference system tailored for detecting and analyzing threats from Bangla YouTube comments. Bangla language models like BanglaBERT have demonstrated remarkable performance in various Bangla natural language processing tasks across different domains. We utilized BanglaBERT for the violence detection task, employing three different classification heads. As BanglaBERT’s vocabulary lacks certain crucial words, our model incorporates some of them as new special tokens, based on their frequency in the dataset, and their embeddings are learned during training. The model achieved the 2nd position on the leaderboard, boasting an impressive macro-F1 Score of 76.04% on the official test set. With the addition of new tokens, we achieved a 76.90% macro-F1 score, surpassing the top score (76.044%) on the test set.- Anthology ID:
- 2023.banglalp-1.24
- Volume:
- Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Farig Sadeque, Ruhul Amin
- Venue:
- BanglaLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 201–207
- Language:
- URL:
- https://aclanthology.org/2023.banglalp-1.24
- DOI:
- 10.18653/v1/2023.banglalp-1.24
- Cite (ACL):
- Md Fahim. 2023. Aambela at BLP-2023 Task 1: Focus on UNK tokens: Analyzing Violence Inciting Bangla Text with Adding Dataset Specific New Word Tokens. In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 201–207, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Aambela at BLP-2023 Task 1: Focus on UNK tokens: Analyzing Violence Inciting Bangla Text with Adding Dataset Specific New Word Tokens (Fahim, BanglaLP 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.banglalp-1.24.pdf