Benjamin Strauss


2017

pdf
Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
Michael Bloodgood | Benjamin Strauss
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments. We present a method for inducing portions of translation lexicons through the use of expert knowledge in these settings where there are approximately zero resources available other than a language informant, potentially not even large amounts of monolingual data. We investigate inducing a Moroccan Darija-English translation lexicon via French loanwords bridging into English and find that a useful lexicon is induced for human-assisted translation and statistical machine translation.

pdf
Using Global Constraints and Reranking to Improve Cognates Detection
Michael Bloodgood | Benjamin Strauss
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Global constraints and reranking have not been used in cognates detection research to date. We propose methods for using global constraints by performing rescoring of the score matrices produced by state of the art cognates detection systems. Using global constraints to perform rescoring is complementary to state of the art methods for performing cognates detection and results in significant performance improvements beyond current state of the art performance on publicly available datasets with different language pairs and various conditions such as different levels of baseline state of the art performance and different data size conditions, including with more realistic large data size conditions than have been evaluated with in the past.

2016

pdf
Results of the WNUT16 Named Entity Recognition Shared Task
Benjamin Strauss | Bethany Toma | Alan Ritter | Marie-Catherine de Marneffe | Wei Xu
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

This paper presents the results of the Twitter Named Entity Recognition shared task associated with W-NUT 2016: a named entity tagging task with 10 teams participating. We outline the shared task, annotation process and dataset statistics, and provide a high-level overview of the participating systems for each shared task.

2014

pdf
Translation memory retrieval methods
Michael Bloodgood | Benjamin Strauss
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics