Matthew Riemer

2025

Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce **EpMAN** – a method for processing long contexts in an episodic memory module while holistically attending to semantically-relevant context chunks. Output from episodic attention is then used to reweigh the decoder’s self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using **EpMAN**, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks.

pdf bib abs
Combining Domain and Alignment Vectors Provides Better Knowledge-Safety Trade-offs in LLMs
Megh Thakkar | Quentin Fournier | Matthew Riemer | Pin-Yu Chen | Amal Zouaq | Payel Das | Sarath Chandar
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruction-tuned counterparts. However, these expert models are not either explicitly trained to be safe, or experience a loss in their safety abilities in the process, making them capable of generating harmful content. We observe that simple interpolation between the domain and alignment delta parameters leads to safer domain-specific models that preserve their utility. Building on this, we introduce MergeAlign, a simple, efficient, and effective model merging-based alignment method. We apply MergeAlign on Llama3 models that are experts in medicine and finance, obtaining substantial safety alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged, as well as the applicability of MergeAlign on more general code and math expert models using the Qwen-2.5 series of models. We hope our findings open new research avenues towards efficient development and deployment of safe expert LLMs.

2024

pdf bib abs
A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques
Megh Thakkar | Quentin Fournier | Matthew Riemer | Pin-Yu Chen | Amal Zouaq | Payel Das | Sarath Chandar
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-training remains out of reach for most researchers due to the compute required, fine-tuning has become affordable thanks to parameter-efficient methods such as LoRA and QLoRA. Alignment is known to be sensitive to the many factors involved, including the quantity and quality of data, the alignment method, and the adapter rank. However, there has not yet been an extensive study of their effect on downstream performance. To address this gap, we conduct an in-depth investigation of the impact of popular choices for three crucial axes: (i) the alignment dataset (HH-RLHF and BeaverTails), (ii) the alignment technique (SFT and DPO), and (iii) the model (LLaMA-1, Vicuna-v1.3, Mistral-7b, and Mistral-7b-Instruct). Our extensive setup spanning over 300 experiments reveals consistent trends and unexpected findings. We observe how more informative data helps with preference alignment, cases where supervised fine-tuning outperforms preference optimization, and how aligning to a distinct preference boosts performance on downstream tasks. Through our in-depth analyses, we put forward key guidelines to help researchers perform more effective parameter-efficient LLM alignment.

2019

We introduce Recursive Routing Networks (RRNs), which are modular, adaptable models that learn effectively in diverse environments. RRNs consist of a set of functions, typically organized into a grid, and a meta-learner decision-making component called the router. The model jointly optimizes the parameters of the functions and the meta-learner’s policy for routing inputs through those functions. RRNs can be incorporated into existing architectures in a number of ways; we explore adding them to word representation layers, recurrent network hidden layers, and classifier layers. Our evaluation task is natural language inference (NLI). Using the MultiNLI corpus, we show that an RRN’s routing decisions reflect the high-level genre structure of that corpus. To show that RRNs can learn to specialize to more fine-grained semantic distinctions, we introduce a new corpus of NLI examples involving implicative predicates, and show that the model components become fine-tuned to the inferential signatures that are characteristic of these predicates.

2015

pdf bib
A Deep Learning and Knowledge Transfer Based Architecture for Social Media User Characteristic Determination
Matthew Riemer | Sophia Krasikov | Harini Srinivasan
Proceedings of the third International Workshop on Natural Language Processing for Social Media

Co-authors

Venues

Fix author