Corey Miller


Corpus Creation and Evaluation for Speech-to-Text and Speech Translation
Corey Miller | Evelyne Tzoukermann | Jennifer Doyon | Elizabeth Mallard
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

The National Virtual Translation Center (NVTC) seeks to acquire human language technology (HLT) tools that will facilitate its mission to provide verbatim English translations of foreign language audio and video files. In the text domain, NVTC has been using translation memory (TM) for some time and has reported on the incorporation of machine translation (MT) into that workflow (Miller et al., 2020). While we have explored the use of speech-totext (STT) and speech translation (ST) in the past (Tzoukermann and Miller, 2018), we have now invested in the creation of a substantial human-made corpus to thoroughly evaluate alternatives. Results from our analysis of this corpus and the performance of HLT tools point the way to the most promising ones to deploy in our workflow.


Plugging into Trados: Augmenting Translation in the Enclave
Corey Miller | Chiara Higgins | Paige Havens | Steven Van Guilder | Rodney Morris | Danielle Silverman
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)


Embedding Register-Aware MT into the CAT Workflow
Corey Miller | Danielle Silverman | Vanesa Jurica | Elizabeth Richerson | Rodney Morris | Elisabeth Mallard
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

Evaluating Automatic Speech Recognition in Translation
Evelyne Tzoukermann | Corey Miller
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)


Employing Phonetic Speech Recognition for Language and Dialect Specific Search
Corey Miller | Rachel Strong | Evan Jones | Mark Vinson
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects


Error Correction for Arabic Dictionary Lookup
C. Anton Rytting | Paul Rodrigues | Tim Buckwalter | David Zajic | Bridget Hirsch | Jeff Carnes | Nathanael Lynn | Sarah Wayland | Chris Taylor | Jason White | Charles Blake III | Evelyn Browne | Corey Miller | Tristan Purvis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe a new Arabic spelling correction system which is intended for use with electronic dictionary search by learners of Arabic. Unlike other spelling correction systems, this system does not depend on a corpus of attested student errors but on student- and teacher-generated ratings of confusable pairs of phonemes or letters. Separate error modules for keyboard mistypings, phonetic confusions, and dialectal confusions are combined to create a weighted finite-state transducer that calculates the likelihood that an input string could correspond to each citation form in a dictionary of Iraqi Arabic. Results are ranked by the estimated likelihood that a citation form could be misheard, mistyped, or mistranscribed for the input given by the user. To evaluate the system, we developed a noisy-channel model trained on studentsÂ’ speech errors and use it to perturb citation forms from a dictionary. We compare our system to a baseline based on Levenshtein distance and find that, when evaluated on single-error queries, our system performs 28% better than the baseline (overall MRR) and is twice as good at returning the correct dictionary form as the top-ranked result. We believe this to be the first spelling correction system designed for a spoken, colloquial dialect of Arabic.