2025
pdf
bib
abs
Domain Adaptation of Foundation LLMs for e-Commerce
Christian Herold
|
Michael Kozielski
|
Tala Bazazo
|
Pavel Petrushkov
|
Yannick Versley
|
Seyyed Hadi Hashemi
|
Patrycja Cieplicka
|
Dominika Basaj
|
Shahram Khadivi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
We present the e-Llama models: 8 billion and 70 billion parameter large language models that are adapted towards the e-commerce domain.These models are meant as foundation models with deep knowledge about e-commerce, that form a base for instruction- and fine-tuning.The e-Llama models are obtained by continuously pretraining the Llama 3.1 base models on 1 trillion tokens of domain-specific data.We discuss our approach and motivate our choice of hyperparameters with a series of ablation studies.To quantify how well the models have been adapted to the e-commerce domain, we define and implement a set of multilingual, e-commerce specific evaluation tasks.We show that, when carefully choosing the training setup, the Llama 3.1 models can be adapted towards the new domain without sacrificing significant performance on general domain tasks.We also explore the possibility of merging the adapted model and the base model for a better control of the performance trade-off between domains.
pdf
bib
abs
Evaluation of Attribution Bias in Generator-Aware Retrieval-Augmented Large Language Models
Amin Abolghasemi
|
Leif Azzopardi
|
Seyyed Hadi Hashemi
|
Maarten de Rijke
|
Suzan Verberne
Findings of the Association for Computational Linguistics: ACL 2025
Attributing answers to source documents is an approach used to enhance the verifiability of a model’s output in retrieval-augmented generation (RAG). Prior work has mainly focused on improving and evaluating the attribution quality of large language models (LLMs) in RAG, but this may come at the expense of inducing biases in the attribution of answers. We define and examine two aspects in the evaluation of LLMs in RAG pipelines, namely attribution sensitivity and bias with respect to authorship information. We explicitly inform an LLM about the authors of source documents, instruct it to attribute its answers, and analyze (i) how sensitive the LLM’s output is to the author of source documents, and (ii) whether the LLM exhibits a bias towards human-written or AI-generated source documents. We design an experimental setup in which we use counterfactual evaluation to study three LLMs in terms of their attribution sensitivity and bias in RAG pipelines. Our results show that adding authorship information to source documents can significantly change the attribution quality of LLMs by 3 to 18%. We show that LLMs can have an attribution bias towards explicit human authorship, which can serve as a competing hypothesis for findings of prior work that shows that LLM-generated content may be preferred over human-written contents. Our findings indicate that metadata of source documents can influence LLMs’ trust, and how they attribute their answers. Furthermore, our research highlights attribution bias and sensitivity as a novel aspect of the vulnerability of LLMs.
pdf
bib
abs
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev
|
Christian Herold
|
Baohao Liao
|
Seyyed Hadi Hashemi
|
Shahram Khadivi
|
Christof Monz
Findings of the Association for Computational Linguistics: ACL 2025
This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model’s outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model’s ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit’s superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit’s robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.
pdf
bib
abs
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning
Baohao Liao
|
Christian Herold
|
Seyyed Hadi Hashemi
|
Stefan Vasilev
|
Shahram Khadivi
|
Christof Monz
Findings of the Association for Computational Linguistics: ACL 2025
As large language models (LLMs) scale, model compression is crucial for edge deployment and accessibility. Weight-only quantization reduces model size but suffers from performance degradation at lower bit widths. Moreover, standard finetuning is incompatible with quantized models, and alternative methods often fall short of full finetuning. In this paper, we propose ClusComp, a simple yet effective compression paradigm that clusters weight matrices into codebooks and finetunes them block-by-block. ClusComp (1) achieves superior performance in 2-4 bit quantization, (2) pushes compression to 1-bit while outperforming ultra-low-bit methods with minimal finetuning, and (3) enables efficient finetuning, even surpassing existing quantization-based approaches and rivaling full FP16 finetuning. Notably, ClusComp supports compression and finetuning of 70B LLMs on a single A6000-48GB GPU.