Ethel Ong


2026

Phonics stories are essential for early literacy, requiring controlled repetition of grapheme-phoneme (GP) patterns while maintaining simplicity, suitability, and quality. Generating such texts poses a challenge for large language models (LLMs), which must balance multiple phonological and pedagogical constraints. We evaluate six LLMs in a zero-shot setting across 16 prompt configurations, producing 8,688 outputs and 39,096 stories. Outputs are assessed using a multi-dimensional framework covering phonological alignment, developmental lexical appropriateness, readability, and narrative quality. Results show that while LLMs generate highly readable and age-appropriate text, they exhibit variability in phoneme control and narrative coherence. Prompt design significantly affects performance, revealing trade-offs between focusing on multiple phonological, linguistic, and pedagogical constraints, while model choice also leads to significant differences. These findings highlight the challenges of controllable educational text generation and the importance of prompt design in balancing instructional objectives. We release our prompts, generated stories, and evaluation framework to support future work in phonics-based story generation for early readers.

2024

2021

2018

2011

2010

2009

2008