Robert Munro


2020

We report that state-of-the-art parsers consistently failed to identify “hers” and “theirs” as pronouns but identified the masculine equivalent “his”. We find that the same biases exist in recent language models like BERT. While some of the bias comes from known sources, like training data with gender imbalances, we find that the bias is _amplified_ in the language models and that linguistic differences between English pronouns that are not inherently biased can become biases in some machine learning models. We introduce a new technique for measuring bias in models, using Bayesian approximations to generate partially-synthetic data from the model itself.

2012

2011

2010

In the wake of the January 12 earthquake in Haiti it quickly became clear that the existing emergency response services had failed but text messages were still getting through. A number of people quickly came together to establish a text-message based emergency reporting system. There was one hurdle: the majority of the messages were in Haitian Kreyol, which for the most part was not understood by the primary emergency responders, the US Military. We therefore crowdsourced the translation of messages, allowing volunteers from within the Haitian Kreyol and French-speaking communities to translate, categorize and geolocate the messages in real-time. Collaborating online, they employed their local knowledge of locations, regional slang, abbreviations and spelling variants to process more than 40,000 messages in the first six weeks alone. According the responders this saved hundreds of lives and helped direct the first food and aid to tens of thousands. The average turn-around from a message arriving in Kreyol to it being translated, categorized, geolocated and streamed back to the responders was 10 minutes. Collaboration among translators was crucial for data-quality, motivation and community contacts, enabling richer value-adding in the translation than would have been possible from any one person.

2003

2002