Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Valentina Pyatkin, Andreas Vlachos (Editors)

Anthology ID:: 2025.emnlp-tutorials
Month:: November
Year:: 2025
Address:: Suzhou, China
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-tutorials/
DOI:
ISBN:: 979-8-89176-336-4
Bib Export formats:: BibTeX
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-tutorials.pdf

pdf bib
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Valentina Pyatkin | Andreas Vlachos

pdf bib abs
Efficient Inference for Large Language Models –Algorithm, Model, and System
Xuefei Ning | Guohao Dai | Haoli Bai | Lu Hou | Yu Wang

The inference of LLMs incurs high computational costs, memory access overhead, and memory usage, leading to inefficiencies in terms of latency, throughput, power consumption, and storage. To this end, this tutorial focuses on the increasingly important topic of Efficient Inference for LLMs and aims to provide a systematic understanding of key facts and methodologies from a designer’s perspective. We start by introducing the basic concepts of modern LLMs, software and hardware. Following this, we define the efficiency optimization problem. To equip the audience with a designer’s mindset, we briefly explain how to diagnose efficiency bottlenecks for a given workload on specific hardware. After introducing the basics, we will introduce our full-stack taxonomy of efficient inference methods for LLMs. We will walk through each category of methodology, using one to three representative methods as examples for each leaf subcategory, elaborating on the design logic behind each method and which inefficiency factors they primarily address. Finally, we will wrap up with a takeaway summary, and future research directions.

pdf bib abs
Advancing Language Models through Instruction Tuning: Recent Progress and Challenges
Zhihan Zhang | Renze Lou | Fangkai Jiao | Wenpeng Yin | Meng Jiang

The capability of following instructions is a key dimension for AI systems. Therefore, in NLP, instruction tuning – the process of training language models to follow natural language instructions – has become a fundamental component of the model development pipeline. This tutorial addresses three critical questions within the field: (1) What are the current focal points in instruction tuning research? (2) What are the best practices in training an instruction-following model? (3) What new challenges have emerged? To answer these questions, the tutorial presents a systematic overview of recent advances in instruction tuning. It covers different stages in model training: supervised fine-tuning, preference optimization, and reinforcement learning. It introduces scalable strategies for building high-quality instruction data, explores approaches for training autonomous AI agents that handle complex real-world tasks, and discusses common criteria for evaluating instruction-following models. The audience will gain a comprehensive understanding of cutting-edge trends in instruction tuning and insights into promising directions for future research.

pdf bib abs
Spoken Conversational Agents with Large Language Models
Huck Yang | Andreas Stolcke | Larry P. Heck

Spoken conversational agents are converging toward voice-native LLMs. This tutorial distills the path from cascaded ASR/NLU to end-to-end, retrieval-and vision-grounded systems. We frame adaptation of text LLMs to audio, cross-modal alignment, and joint speech–text training; review datasets, metrics, and robustness across accents; and compare design choices (cascaded vs. E2E, post-ASR correction, streaming). We link industrial assistants to current open-domain and task-oriented agents, highlight reproducible baselines, and outline open problems in privacy, safety, and evaluation. Attendees leave with practical recipes and a clear systems-level roadmap.

Language models (LMs) like GPT and Claude have shown impressive abilities in a range of natural language processing (NLP) tasks. Among these tasks, code understanding and generation have quickly become one of the most popular applications of LMs, given its nature of executable logic forms. However, there is a practical understanding of how programming knowledge can be combined with natural language to automate software development. Moreover, recent studies also empirically demonstrate that code can be a better form for complex reasoning and agentic task automation, but they do not indicate their significance. In this tutorial, we deem such superior capabilities brought by code modeling as Code Intelligence, and aim to provide a coherent overview of recent advances in this topic. We will start by first providing preliminaries of training foundation models on code and their common practices. We will then focus on downstream tasks in the domain of code and their evaluations. Then, we will cover how code can contribute to advancements in general tasks, and the opportunities of future research on Code Intelligence.

pdf bib abs
Data and Model Centric Approaches for Expansion of Large Language Models to New languages
Anoop Kunchukuttan | Raj Dabre | Rudra Murthy | Mohammed Safi Ur Rahman Khan | Thanmay Jayakumar

Despite the increasing pace of Large Language Model (LLM) research, a vast majority of existing LLMs mainly support English alongside a handful of high resource languages, leaving a major gap for most low-resource languages. In this tutorial, we focus on approaches to expand the language coverage of LLMs. This provides an efficient and viable path to bring LLM technologies to low-resource languages, instead of training from scratch. We look at approaches at various stages of the LLM training pipeline, like tokenizer training, pre-training, instruction tuning, alignment, evaluation, etc., where adaptations are made to support new languages. We look at data-oriented approaches as well as model-oriented approaches. We hope that our tutorial enables researchers and practitioners to work on incorporating additional languages and tasks into existing LLMs to enhance inclusivity and coverage.

pdf bib abs
Neuro-Symbolic Natural Language Processing
André Freitas | Marco Valentino | Danilo Silva de Carvalho

Despite the performance leaps delivered by Large Language Models (LLMs), NLP systems based only on deep learning architectures still have limiting capabilities in terms of delivering safe and controlled reasoning, interpretability, and adaptability within complex and specialised domains, restricting their use in areas where reliability and trustworthiness are crucial. Neur-symbolic NLP methods seek to overcome these limitations by integrating the flexibility of contemporary language models with the control/interpretability of symbolic methods. This hybrid approach brings the promise to both enhance inference capabilities and to deepen the theoretical understanding of LLMs. This tutorial aims to bridge the gap between the practical performance of LLMs and the principled modelling of language and inference of formal methods. We provide an overview of formal foundations in linguistics and reasoning, followed by contemporary architectural mechanisms to interpret, control, and extend NLP models. Balancing theoretical and practical activities, the tutorial is suitable for PhD students, experienced researchers, and industry practitioners.

pdf bib abs
Continual Learning of Large Language Models
Tongtong Wu | Trang Vu | Linhao Luo | Gholamreza Haffari

As large language models (LLMs) continue to expand in size and utility, keeping them current with evolving knowledge and shifting user preferences becomes an increasingly urgent yet challenging task. This tutorial offers a comprehensive exploration of continual learning (CL) in the context of LLMs, presenting a structured framework that spans continual pre-training, instruction tuning, and alignment. Grounded in recent survey work and empirical studies, we discuss emerging trends, key methods, and practical insights from both academic research and industry deployments. In addition, we highlight the new frontier of lifelong LLM agents, i.e., systems capable of autonomous, self-reflective, and tool-augmented adaptation. Participants will gain a deep understanding of the computational, algorithmic, and ethical challenges inherent to CL in LLMs, and learn about strategies to mitigate forgetting, manage data and evaluation pipelines, and design systems that can adapt responsibly and reliably over time. This tutorial will benefit researchers and practitioners interested in advancing the long-term effectiveness, adaptability, and safety of foundation models.