Spoken Term Discovery for Language Documentation using Translations

Antonios Anastasopoulos; Sameer Bansal; David Chiang; Sharon Goldwater; Adam Lopez

doi:10.18653/v1/W17-4607

Spoken Term Discovery for Language Documentation using Translations

Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, Adam Lopez

[How to correct problems with metadata yourself]

Abstract

Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available. We present a method for partially labeling additional speech with translations in this scenario. We modify an unsupervised speech-to-translation alignment model and obtain prototype speech segments that match the translation words, which are in turn used to discover terms in the unlabelled data. We evaluate our method on a Spanish-English speech translation corpus and on two corpora of endangered languages, Arapaho and Ainu, demonstrating its appropriateness and applicability in an actual very-low-resource scenario.

Anthology ID:: W17-4607
Volume:: Proceedings of the Workshop on Speech-Centric Natural Language Processing
Month:: September
Year:: 2017
Address:: Copenhagen, Denmark
Editors:: Nicholas Ruiz, Srinivas Bangalore
Venue:: WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 53–58
Language:
URL:: https://aclanthology.org/W17-4607
DOI:: 10.18653/v1/W17-4607
Bibkey:
Cite (ACL):: Antonios Anastasopoulos, Sameer Bansal, David Chiang, Sharon Goldwater, and Adam Lopez. 2017. Spoken Term Discovery for Language Documentation using Translations. In Proceedings of the Workshop on Speech-Centric Natural Language Processing, pages 53–58, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):: Spoken Term Discovery for Language Documentation using Translations (Anastasopoulos et al., 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/teach-a-man-to-fish/W17-4607.pdf

PDF Search