Mariam Barakat


2026

We present a modular pipeline for educational analogy generation, decomposed into four stages – source finding, sub-concept generation, explanation generation, and evaluation – grounded in Structure Mapping Theory. Evaluating 12 LLMs across six model families on SCAR and ParallelPARC, we find that sub-concept grounding substantially improves retrieval precision and explanation quality but offers limited benefit in open-ended generation. We further validate an LLM-as-a-judge methodology against human annotations, finding that Claude Sonnet 4.6 aligns more reliably with human rankings than with absolute scores. Our results highlight cross-stage interactions that isolated studies cannot capture, and position sub-concept grounding as a key driver of analogy quality.