Learning Stress in Arabic Low-Resource Settings

Abed Qaddoumi, Jordan Kodner, Owen Rambow, Salam Khalifa, Jeffrey Heinz


Abstract
We predict lexical stress in Arabic varieties using syllable structure (a sequence of CVs, with C for consonants and V for vowels). Our task is generation: given an unstressed input, the system outputs a stress-marked word. We compare four approaches: a grammar induction algorithm (BUFIA), a transformer-based neural network (NN), a rule-based method, and a frequency baseline. The models are evaluated across several low-resource settings by varying the training data size by words, structural type, and syllable count. BUFIA outperforms the neural network, especially when data are scarce. This supports grammar induction as an interpretable and sample-efficient alternative for learning stress.
Anthology ID:
2026.scil-main.24
Volume:
Proceedings of the Society for Computation in Linguistics 2026
Month:
July
Year:
2026
Address:
San Diego, CA
Editors:
Rob Voigt, Alex Warstadt, Naomi Feldman, Tal Linzen
Venues:
SCiL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
262–279
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.24/
DOI:
Bibkey:
Cite (ACL):
Abed Qaddoumi, Jordan Kodner, Owen Rambow, Salam Khalifa, and Jeffrey Heinz. 2026. Learning Stress in Arabic Low-Resource Settings. In Proceedings of the Society for Computation in Linguistics 2026, pages 262–279, San Diego, CA. Association for Computational Linguistics.
Cite (Informal):
Learning Stress in Arabic Low-Resource Settings (Qaddoumi et al., SCiL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.scil-main.24.pdf