Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts)

Yuki Arase, David Jurgens, Fei Xia (Editors)

Anthology ID:: 2025.acl-tutorials
Month:: July
Year:: 2025
Address:: Vienna, Austria
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-tutorials/
DOI:
ISBN:: 979-8-89176-255-8
Bib Export formats:: BibTeX
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-tutorials.pdf

pdf bib
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 5: Tutorial Abstracts)
Yuki Arase | David Jurgens | Fei Xia

pdf bib abs
Inverse Reinforcement Learning Meets Large Language Model Alignment
Mihaela van der Schaar | Hao Sun

In the era of Large Language Models (LLMs), alignment has emerged as a fundamental yet challenging problem in the pursuit of more reliable, controllable, and capable machine intelligence. The recent success of reasoning models and conversational AI systems has underscored the critical role of reinforcement learning (RL) in enhancing these systems, driving increased research interest at the intersection of RL and LLM alignment.This tutorial will provide a comprehensive review of recent advances in LLM alignment through the lens of inverse reinforcement learning (IRL), emphasizing the distinctions between RL techniques employed in LLM alignment and those in conventional RL tasks. In particular, we highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift. The tutorial will begin with fundamental concepts in RL to provide a foundation for the audience unfamiliar with the field. We then examine recent advances in this research agenda, discussing key challenges and opportunities in conducting IRL for LLM alignment. Beyond methodological considerations, we explore practical aspects, including datasets, benchmarks, evaluation metrics, infrastructure, and computationally efficient training and inference techniques.Finally, we draw insights from the literature on sparse-reward RL to identify open questions and potential research directions. By synthesizing findings from diverse studies, we aim to provide a structured and critical overview of the field, highlight unresolved challenges, and outline promising future directions for improving LLM alignment through RL and IRL techniques.

pdf bib abs
Eye Tracking and NLP
David Reich | Omer Shubi | Lena Jäger | Yevgeni Berzak

Our tutorial introduces a growing research area that combines eye tracking during reading with NLP. The tutorial outlines how eye movements in reading can be leveraged for NLP, and, vice versa, how NLP methods can advance psycholinguistic modeling of eye movements in reading. We cover four main themes: (i) fundamentals of eye movements in reading, (ii) experimental methodologies and available data, (iii) integrating eye movement data in NLP models, and (iv) using LLMs for modeling eye movements in reading. The tutorial is tailored to NLP researchers and practitioners, and provides the essential background for conducting research on joint modeling of eye movements and text.

Large language models (LLMs) are widely used in NLP applications, but their tendency to produce hallucinations poses significant challenges to the reliability and safety, ultimately undermining user trust. This tutorial offers the first systematic introduction to uncertainty quantification (UQ) for LLMs in text generation tasks – a conceptual and methodological framework that provides tools for communicating the reliability of a model answer. This additional output could be leveraged for a range of downstream tasks, including hallucination detection and selective generation. We begin with the theoretical foundations of uncertainty, highlighting why techniques developed for classification might fall short in text generation. Building on this grounding, we survey state-of-the-art white-box and black-box UQ methods, from simple entropy-based scores to supervised probes over hidden states and attention weights, and show how they enable selective generation and hallucination detection. Additionally, we discuss the calibration of uncertainty scores for better interpretability. A key feature of the tutorial is practical examples using LM-Polygraph, an open-source framework that unifies more than a dozen recent UQ and calibration algorithms and provides a large-scale benchmark, allowing participants to implement UQ in their applications, as well as reproduce and extend experimental results with only a few lines of code. By the end of the session, researchers and practitioners will be equipped to (i) evaluate and compare existing UQ techniques, (ii) develop new methods, and (iii) implement UQ in their code for deploying safer, more trustworthy LLM-based systems.

pdf bib abs
Human-AI Collaboration: How AIs Augment Human Teammates
Sherry Wu | Diyi Yang | Joseph Chang | Marti A. Hearst | Kyle Lo

The continuous, rapid development of general-purpose models like LLMs suggests the theoretical possibility of AI performing any human task. Yet, despite the potential and promise, these models are far from perfect, excelling at certain tasks while struggling with others. The tension between what is possible and a model’s limitations raises the general research question that has attracted attention from various disciplines: What is the best way to use AI to maximize its benefits? In this tutorial, we will review recent developments related to human-AI teaming and collaboration. To the best of our knowledge, our tutorial will be the first to provide a more integrated view from NLP, HCI, Computational Social Science, and Learning Science, etc., and highlight how different communities have identified the goals and societal impacts of such collaborations, both positive and negative. We will further discuss how to operationalize these Human-AI collaboration goals, and reflect on how state-of-the-art AI models should be evaluated and scaffolded to make them most useful in collaborative contexts.

With NLP research being rapidly productionized into real-world applications, it is important to be aware of and think through the consequences of our work. Such ethical considerations are important in both authoring and reviewing (e.g. privacy, consent, fairness, among others). This tutorial will equip participants with basic guidelines for thinking deeply about ethical issues and review common considerations that recur in NLP research. The methodology is interactive and participatory, including discussion of case studies and group work. Participants will gain practical experience on when to flag a paper for ethics review and how to write an ethical consideration section to be shared with the broader community. Most importantly, the participants will be co-creating the tutorial outcomes and extending tutorial materials to share as public outcomes.

pdf bib abs
NLP for Counterspeech against Hate and Misinformation (CSHAM)
Daniel Russo | Helena Bonaldi | Yi-Ling Chung | Gavin Abercrombie | Marco Guerini

This tutorial aims to bring together research from different fields such as computer science and the social sciences and policy to show how counterspeech is currently used to tackle abuse and misinformation by individuals, activists and organisations, how Natural Language Processing (NLP) and Generation (NLG) can be applied to automate its production, and the implications of using large language models for this task. It will also address, but not be limited to, the questions of how to evaluate and measure the impacts of counterspeech, the importance of expert knowledge from civil society in the development of counterspeech datasets and taxonomies, and how to ensure fairness and mitigate the biases present in language models when generating counterspeech. The tutorial will bring diverse multidisciplinary perspectives to safety research by including case studies from industry and public policy to share insights on the impact of counterspeech and social correction and the implications of applying NLP to critical real-world problems. It will also go deeper into the challenging task of tackling hate and misinformation together, which represents an open research question yet to be addressed in NLP but gaining attention as a stand alone topic.

pdf bib abs
Synthetic Data in the Era of Large Language Models
Vijay Viswanathan | Xiang Yue | Alisa Liu | Yizhong Wang | Graham Neubig

Progress in natural language processing has historically been driven by better data, and researchers today are increasingly using ‘synthetic data’ - data generated with the assistance of large language models - to make dataset construction faster and cheaper. However, most synthetic data generation approaches are executed in an ad hoc manner and ‘reinvent the wheel’ rather than build on prior foundations. This tutorial seeks to build a shared understanding of recent progress in synthetic data generation from NLP and related fields by grouping and describing major methods, applications, and open problems. Our tutorial will be divided into four main sections. First, we will describe algorithms for producing high-quality synthetic data. Second, we will describe how synthetic data can be used to advance the general-purpose development and study of language models. Third, we will demonstrate how to customize synthetic data generation to support scenario-specific applications. Finally, we will discuss open questions about the production and use of synthetic data that must be answered to overcome some of their current limitations. Our goal is that by unifying recent advances in this emerging research direction, we can build foundations upon which the community can improve the rigor, understanding, and effectiveness of synthetic data moving forward.

Pretrained generative models, especially large language models, provide novel ways for users to interact with computers. While generative NLP research and applications had previously aimed at very domain-specific or task-specific solutions, current LLMs and applications (e.g. dialogue systems, agents) are versatile across many tasks and domains. Despite being trained to be helpful and aligned with human preferences (e.g., harmlessness), enforcing robust guardrails on LLMs remains a challenge. And, even when protected against rudimentary attacks, just like other complex software, LLMs can be vulnerable to attacks using sophisticated adversarial inputs. This tutorial provides a comprehensive overview of key guardrail mechanisms developed for LLMs, along with evaluation methodologies and a detailed security assessment protocol - including auto red-teaming of LLM-powered applications. Our aim is to move beyond the discussion of single prompt attacks and evaluation frameworks towards addressing how guardrailing can be done in complex dialogue systems that employ LLMs.