UnityAI Guard: Pioneering Toxicity Detection Across Low-Resource Indian Languages
Himanshu Beniwal, Reddybathuni Venkat, Rohit Kumar, Birudugadda Srivibhav, Daksh Jain, Pavan Deekshith Doddi, Eshwar Dhande, Adithya Ananth, Kuldeep, Mayank Singh
Abstract
This work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 567k training instances and 30k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.- Anthology ID:
- 2025.emnlp-demos.33
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Ivan Habernal, Peter Schulam, Jörg Tiedemann
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 471–479
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-demos.33/
- DOI:
- Cite (ACL):
- Himanshu Beniwal, Reddybathuni Venkat, Rohit Kumar, Birudugadda Srivibhav, Daksh Jain, Pavan Deekshith Doddi, Eshwar Dhande, Adithya Ananth, Kuldeep, and Mayank Singh. 2025. UnityAI Guard: Pioneering Toxicity Detection Across Low-Resource Indian Languages. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 471–479, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- UnityAI Guard: Pioneering Toxicity Detection Across Low-Resource Indian Languages (Beniwal et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-demos.33.pdf