Ser-Nam Lim
2026
BOOKAGENT: Orchestrating Safety-Aware Visual Narratives via Multi-Agent Cognitive Calibration
Bo Gao | Chang Liu | Yuyang Miao | SiYuan Ma | Ser-Nam Lim
Findings of the Association for Computational Linguistics: ACL 2026
Bo Gao | Chang Liu | Yuyang Miao | SiYuan Ma | Ser-Nam Lim
Findings of the Association for Computational Linguistics: ACL 2026
Recent advancements in Large Generative Models (LGMs) have revolutionized multi-modal generation. However, generating illustrated storybooks remains an open challenge, where prior works mainly decompose this task into separate stages, and thus, holistic multi-modal grounding remains limited. Besides, while safety alignment is studied for text- or image-only generation, existing works rarely integrate child-specific safety constraints into narrative planning and sequence-level multi-modal verification. To address these limitations, we propose BookAgent, a safety-aware multi-agent collaboration framework designed for high-quality, safety-aware visual narratives. Different from prior story visualization models that assume a fixed storyline sequence, BookAgent targets end-to-end storybook synthesis from a user draft by jointly planning, scripting, illustrating, and globally repairing inconsistencies. To ensure precise multi-modal grounding, BookAgent dynamically calibrates page-level alignment between textual scripts and visual layouts. Furthermore, BookAgent calibrates holistic consistency from the temporal dimension, by verifying-then-rectifying global inconsistencies in character identity and storytelling logic. Extensive experiments demonstrate that BookAgent significantly outperforms current methods in narrative coherence, visual consistency, and safety compliance, offering a robust paradigm for reliable agents in complex multi-modal creation. The implementation will be publicly released at https://github.com/bogao-code/BookAgent/tree/main.
2025
Scaling Up Temporal Domain Generalization via Temporal Experts Averaging
Aoming Liu | Kevin Miller | Venkatesh Saligrama | Kate Saenko | Boqing Gong | Ser-Nam Lim | Bryan A. Plummer
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Aoming Liu | Kevin Miller | Venkatesh Saligrama | Kate Saenko | Boqing Gong | Ser-Nam Lim | Bryan A. Plummer
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Temporal Domain Generalization (TDG) aims to generalize across temporal distribution shifts, e.g., lexical change over time. Prior work often addresses this by predicting future model weights. However, full model prediction is prohibitively expensive for even reasonably sized models. Thus, recent methods only predict the classifier layer, limiting generalization by failing to adjust other model components. To address this, we propose Temporal Expert Averaging (TEA), a novel and scalable TDG framework that updates the entire model using weight averaging to maximize generalization potential while minimizing computational costs. Our theoretical analysis guides us to two steps that enhance generalization to future domains. First, we create expert models with functional diversity yet parameter similarity by fine-tuning a domain-agnostic base model on individual temporal domains while constraining weight changes. Second, we optimize the bias-variance tradeoff through adaptive averaging coefficients derived from modeling temporal weight trajectories in a principal component subspace. Expert’s contributions are based on their projected proximity to future domains. Extensive experiments across 7 TDG benchmarks, 5 models, and 2 TDG settings shows TEA outperforms prior TDG methods by up to 69% while being up to 60x more efficient.
2021
When in Doubt: Improving Classification Performance with Alternating Normalization
Menglin Jia | Austin Reiter | Ser-Nam Lim | Yoav Artzi | Claire Cardie
Findings of the Association for Computational Linguistics: EMNLP 2021
Menglin Jia | Austin Reiter | Ser-Nam Lim | Yoav Artzi | Claire Cardie
Findings of the Association for Computational Linguistics: EMNLP 2021
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification. CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution using the predicted class distributions of high-confidence validation examples. CAN is easily applicable to any probabilistic classifier, with minimal computation overhead. We analyze the properties of CAN using simulated experiments, and empirically demonstrate its effectiveness across a diverse set of classification tasks.
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur | Natalia Neverova | Chris Stauffer | Ser-Nam Lim | Douwe Kiela | Austin Reiter
Findings of the Association for Computational Linguistics: EMNLP 2021
Shir Gur | Natalia Neverova | Chris Stauffer | Ser-Nam Lim | Douwe Kiela | Austin Reiter
Findings of the Association for Computational Linguistics: EMNLP 2021
Recent advances in using retrieval components over external knowledge sources have shown impressive results for a variety of downstream tasks in natural language processing. Here, we explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering (VQA). First, we train a novel alignment model for embedding images and captions in the same space, which achieves substantial improvement in performance on image-caption retrieval w.r.t. similar methods. Second, we show that retrieval-augmented multi-modal transformers using the trained alignment model improve results on VQA over strong baselines. We further conduct extensive experiments to establish the promise of this approach, and examine novel applications for inference time such as hot-swapping indices.