Abstract
Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.- Anthology ID:
- D19-5907
- Volume:
- Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong
- Editors:
- Silviu Paun, Dirk Hovy
- Venue:
- WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 48–52
- Language:
- URL:
- https://aclanthology.org/D19-5907
- DOI:
- 10.18653/v1/D19-5907
- Cite (ACL):
- Sanket Shah, Pratik Joshi, Sebastin Santy, and Sunayana Sitaram. 2019. CoSSAT: Code-Switched Speech Annotation Tool. In Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP, pages 48–52, Hong Kong. Association for Computational Linguistics.
- Cite (Informal):
- CoSSAT: Code-Switched Speech Annotation Tool (Shah et al., 2019)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/D19-5907.pdf