Nathaniel Romney Robinson
Also published as: Nathaniel Romney Robinson
2024
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
Vilém Zouhar
|
Kalvin Chang
|
Chenxuan Cui
|
Nate B. Carlson
|
Nathaniel Romney Robinson
|
Mrinmaya Sachan
|
David R. Mortensen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Mapping words into a fixed-dimensional vector space is the backbone of modern NLP. While most word embedding methods successfully encode semantic information, they overlook phonetic information that is crucial for many tasks. We develop three methods that use articulatory features to build phonetically informed word embeddings. To address the inconsistent evaluation of existing phonetic word embedding methods, we also contribute a task suite to fairly evaluate past, current, and future methods. We evaluate both (1) intrinsic aspects of phonetic word embeddings, such as word retrieval and correlation with sound similarity, and (2) extrinsic performance on tasks such as rhyme and cognate detection and sound analogies. We hope our task suite will promote reproducibility and inspire future phonetic embedding research.
JHU IWSLT 2024 Dialectal and Low-resource System Description
Nathaniel Romney Robinson
|
Kaiser Sun
|
Cihan Xiao
|
Niyati Bafna
|
Weiting Tan
|
Haoran Xu
|
Henry Li Xinyuan
|
Ankur Kejriwal
|
Sanjeev Khudanpur
|
Kenton Murray
|
Paul McNamee
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Johns Hopkins University (JHU) submitted systems for all eight language pairs in the 2024 Low-Resource Language Track. The main effort of this work revolves around fine-tuning large and publicly available models in three proposed systems: i) end-to-end speech translation (ST) fine-tuning of Seamless4MT v2; ii) ST fine-tuning of Whisper; iii) a cascaded system involving automatic speech recognition with fine-tuned Whisper and machine translation with NLLB. On top of systems above, we conduct a comparative analysis on different training paradigms, such as intra-distillation for NLLB as well as joint training and curriculum learning for SeamlessM4T v2. Our results show that the best-performing approach differs by language pairs, but that i) fine-tuned SeamlessM4T v2 tends to perform best for source languages on which it was pre-trained, ii) multi-task training helps Whisper fine-tuning, iii) cascaded systems with Whisper and NLLB tend to outperform Whisper alone, and iv) intra-distillation helps NLLB fine-tuning.
Search
Co-authors
- Vilém Zouhar 1
- Kalvin Chang 1
- Chenxuan Cui 1
- Nate B. Carlson 1
- Mrinmaya Sachan 1
- show all...