Shaomei Wu


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Lost in Translation: Benchmarking Commercial Machine Translation Models for Dyslexic-Style Text
Gregory Price | Shaomei Wu
Findings of the Association for Computational Linguistics: ACL 2025

Dyslexia can affect writing, leading to unique patterns such as letter and homophone swapping. As a result, text produced by people with dyslexia often differs from the text typically used to train natural language processing (NLP) models, raising concerns about their effectiveness for dyslexic users. This paper examines the fairness of four commercial machine translation (MT) systems towards dyslexic text through a systematic audit using both synthetically generated dyslexic text and real writing from individuals with dyslexia. By programmatically introducing various dyslexic-style errors into the WMT dataset, we present insights on how dyslexic biases manifest in MT systems as the text becomes more dyslexic, especially with real-word errors. Our results shed light on the NLP biases affecting people with dyslexia – a population that often relies on NLP tools as assistive technologies, highlighting the need for more diverse data and user representation in the development of foundational NLP models.