Gopal Gupta


2025

Large Language Models (LLMs) have advanced Natural Language Processing (NLP) tasks but are limited in mathematical reasoning. To address this, few-shot examples are used in prompts for in-context learning. However, existing methods require annotated datasets, resulting in higher computational costs and lower quality examples. To mitigate these limitations, we propose AutoMathIC, a framework that automatically generates high-quality in-context examples to enhance LLMs’ mathematical reasoning. AutoMathIC ensures consistency across different modalities (e.g., Chain-of-Thought (CoT), code snippets, and equations) by generating and selecting mutations that improve response consistency. Evaluated on four math problem datasets, AutoMathIC outperforms six baselines, with LLM accuracy ranging from 87.0% to 99.3% for GPT-3.5 and 93.1% to 98.7% for GPT-4o-mini. It surpasses the state-of-the-art in-context example retrieval method in three of the four datasets by 0.3% to 11.8%, without relying on an annotated dataset.