Soroosh Akef


2026

Grammatical error correction approaches rarely characterize the pedagogical value of corrected errors. We propose a framework that combines LLM-based second-language writing correction with a rule-based characterization module to identify pedagogically relevant, fine-grained grammatical properties in learner texts. The characterization module targets 252 European Portuguese properties which are categorized by the CEFR level at which they are taught according to an authoritative curriculum, and property accuracy is inferred from contrasts between the learner and corrected texts. We evaluate the framework extrinsically by training interpretable automatic proficiency assessment models on accuracy features extracted from characterized errors in a Portuguese learner corpus. Across different prompting strategies, we show that models trained on features derived from LLM-corrected texts perform similarly to those trained on features derived from annotator-corrected texts and comparably to models trained on linguistic complexity features. Feature importance overlap is likewise high, and similar predictive patterns are observed in both annotator-based and LLM-based models, further supporting the validity of the proposed framework.

2025

2024