Hana El-Shazli


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Towards Language-Agnostic STIPA: Universal Phonetic Transcription to Support Language Documentation at Scale
Jacob Lee Suchardt | Hana El-Shazli | Pierluigi Cassotti
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

This paper explores the use of existing state-of-the-art speech recognition models (ASR) for the task of generating narrow phonetic transcriptions using the International Phonetic Alphabet (STIPA). Unlike conventional ASR systems focused on orthographic output for high-resource languages, STIPA can be used as a language-agnostic interface valuable for documenting under-resourced and unwritten languages. We introduce a new dataset for South Levantine Arabic and present the first large-scale evaluation of STIPA models across 51 language families. Additionally, we provide a use case on Sanna, a severely endangered language. Our findings show that fine-tuned ASR models can produce accurate IPA transcriptions with limited supervision, significantly reducing phonetic error rates even in extremely low-resource settings. The results highlight the potential of STIPA for scalable language documentation.