Jingru Fan

2026

The scalability of high-quality online education is hindered by the high costs and slow cycles of manual content creation.Despite advancements in video generation, current approaches often fail to ensure pedagogical structure and precise control due to their pixel-level, black-box nature.In this paper, we propose Generative Teaching, a novel paradigm shifting educators from manual creators to high-level directors who focus on pedagogical intents while agents handle the execution. To realize this vision, we introduce TeachMaster, a multi-agent framework that leverages code as an intermediate semantic medium. Unlike traditional video generation methods, TeachMaster orchestrates a collaborative team of agents, spanning planning, design, and rendering, to automate the production of interpretable, editable, and curriculum-ready educational videos. Experiments validate that TeachMaster significantly boosts production efficiency without compromising structural coherence or visual fidelity, slashing production costs to only 0.3% of traditional online course videos and providing a robust solution for scalable education.

2025

pdf bib abs

Existing LLM-based agents have achieved strong performance on held-in tasks, but their generalizability to unseen tasks remains poor. Hence, some recent work focus on fine-tuning the policy model with more diverse tasks to improve the generalizability. In this work, we find that finetuning a reward model to guide the policy model is more robust than directly finetuning the policy model.Based on this finding, we propose AgentRM, a 8B generalizable reward model, to guide the policy model for effective test-time search.We comprehensively investigate three approaches to construct the reward model, including explicit reward modeling, implicit reward modeling and LLM-as-a-judge.We then use AgentRM to guide the answer generation with Best-of-N sampling and beam search.We show that AgentRM is robust to paraphrasings of task instructions and can generalize to unseen tasks that require novel optimal behavior.Through extensive evaluation across nine tasks spanning four categories, AgentRM enhances the non-finetuned 8B policy model by 8.8 points on average, surpassing the top general agent by 4.0.Moreover, it demonstrates weak-to-strong generalization, yielding greater improvement on more powerful policy models.As for the specializability, AgentRM can also boost a finetuned policy model and outperform the top specialized agent by 11.4 on three held-in tasks.Further analysis verifies its effectiveness in test-time scaling.We release the code and data at https://github.com/thunlp/AgentRM.