Abstract
African American Vernacular English (AAVE) is a widely-spoken dialect of English, yet it is under-represented in major speech corpora. As a result, speakers of this dialect are often misunderstood by NLP applications. This study explores the effect on transcription accuracy of an automatic voice recognition system when AAVE data is used. Models trained on AAVE data and on Standard American English data were compared to a baseline model trained on a combination of the two dialects. The accuracy for both dialect-specific models was significantly higher than the baseline model, with the AAVE model showing over 18% improvement. By isolating the effect of having AAVE speakers in the training data, this study highlights the importance of increasing diversity in the field of natural language processing.- Anthology ID:
- R19-2003
- Volume:
- Proceedings of the Student Research Workshop Associated with RANLP 2019
- Month:
- September
- Year:
- 2019
- Address:
- Varna, Bulgaria
- Editors:
- Venelin Kovatchev, Irina Temnikova, Branislava Šandrih, Ivelina Nikolova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 16–20
- Language:
- URL:
- https://aclanthology.org/R19-2003
- DOI:
- 10.26615/issn.2603-2821.2019_003
- Cite (ACL):
- Rachel Dorn. 2019. Dialect-Specific Models for Automatic Speech Recognition of African American Vernacular English. In Proceedings of the Student Research Workshop Associated with RANLP 2019, pages 16–20, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Dialect-Specific Models for Automatic Speech Recognition of African American Vernacular English (Dorn, RANLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/R19-2003.pdf