A Bigger Catch: Fine-Grained Curriculum Standards Alignment on the MathFish Benchmark

Xinman Liu, Mayank Sharma, Xinyu Shi


Abstract
Most existing math benchmarks for LLMs focus on evaluating whether models produce correct solutions. In educational settings, however, it is equally important to understand whether LLMs grasp the pedagogical intent behind math problems, beyond simply arriving at the right answer. Tagging curriculum standards is challenging for the same reason: distinguishing fine-grained standards requires understanding subtle pedagogical distinctions. In this paper, we use the MathFish benchmark, which frames curriculum alignment as a multi-label prediction task over 385 Common Core State Standards, to evaluate a three-stage pipeline inspired by observed failure modes in retrieval and structural reasoning: curriculum-informed hard negatives (M1), a cross-encoder reranker (M2), and a ReAct agent paired with an LLM-as-a-judge critic (M3). We additionally evaluate a training-free alternative (A1) that combines hybrid sparse-dense retrieval with curriculum-graph reranking. M3 achieves 31.3% exact-match accuracy, approximately 6.5× higher than the three-shot GPT-4-Turbo baseline. Error analysis shows that, despite these improvements, the pipeline still struggles with missing predictions, grade-level misalignment, and sibling-standard confusion, reinforcing that precise curriculum alignment remains a fundamentally difficult problem in educational NLP.
Anthology ID:
2026.bea-1.15
Volume:
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
208–220
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.15/
DOI:
Bibkey:
Cite (ACL):
Xinman Liu, Mayank Sharma, and Xinyu Shi. 2026. A Bigger Catch: Fine-Grained Curriculum Standards Alignment on the MathFish Benchmark. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 208–220, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
A Bigger Catch: Fine-Grained Curriculum Standards Alignment on the MathFish Benchmark (Liu et al., BEA 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.15.pdf