Nearchos Potamitis
2026
Current Advances in LLM Reasoning
Akhil Arora | Vishrav Chaudhary | Julia Kreutzer | Nearchos Potamitis | Nouha Dziri | Niket Tandon
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts)
Akhil Arora | Vishrav Chaudhary | Julia Kreutzer | Nearchos Potamitis | Nouha Dziri | Niket Tandon
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts)
As large language models (LLMs) increasingly tackle reasoning-heavy tasks, from mathematics to commonsense to multilingual understanding, researchers face three pressing questions: How well do models reason? How can we make them reason better? What are the next frontiers in LLM reasoning? This tutorial answers these questions through a unified view of LLM reasoning. This tutorial explores comprehensive evaluation strategies to assess the reasoning abilities of models and discusses two types of methods to improve models’ reasoning: advanced inference time methods, such as structured and self-improvement inference methods, and (ii) post-training methods, such as RLHF, DPO, and GRPO that aim to make LLMs think more like humans. The tutorial explores these technical discussions while maintaining a practical outlook through illustrative demos and short guided hands-on exercises. The tutorial is designed for both researchers and practitioners seeking practical insights into LLM reasoning.
2025
Cache Saver: A Modular Framework for Efficient, Affordable, and Reproducible LLM Inference
Nearchos Potamitis | Lars Henning Klein | Bardia Mohammadi | Chongyang Xu | Attreyee Mukherjee | Niket Tandon | Laurent Bindschaedler | Akhil Arora
Findings of the Association for Computational Linguistics: EMNLP 2025
Nearchos Potamitis | Lars Henning Klein | Bardia Mohammadi | Chongyang Xu | Attreyee Mukherjee | Niket Tandon | Laurent Bindschaedler | Akhil Arora
Findings of the Association for Computational Linguistics: EMNLP 2025
Inference constitutes the majority of costs throughout the lifecycle of a large language model (LLM). While numerous LLM inference engines focusing primarily on low-level optimizations have been developed, there is a scarcity of non-intrusive client-side frameworks that perform high-level optimizations. In this paper, we introduce Cache Saver, a modular, plug-and-play, and asynchronous framework that facilitates high-level inference optimizations, thereby integrating cleanly into existing systems without requiring changes to the end-user application logic or the underlying LLM. The key novelty is a *namespace-aware list-valued cache* that ensures *statistical integrity* of LLM responses by generating *i.i.d.* responses within a namespace as well as ensuring *reproducibility*. Moreover, as a direct consequence of operating at a high level, Cache Saver supports both local and online models. We conduct extensive experiments with five representative state-of-the-art reasoning strategies, five diverse benchmark tasks, and three different LLMs. On average across all methods, tasks, and LLMs, Cache Saver reduces cost by ≃ 25% and CO2 by ≃ 35%. Notably, Cache Saver excels in practical machine learning scenarios such as benchmarking across multiple methods or conducting ablation analysis of a specific method, obtaining substantial cost and carbon footprint reduction of ≃ 60%. Cache Saver is publicly available at [https://github.com/au-clan/cachesaver](https://github.com/au-clan/cachesaver).