Emanuele Rodolà

Also published as: Emanuele Rodola


2025

pdf bib
Mergenetic: a Simple Evolutionary Model Merging Library
Adrian Robert Minut | Tommaso Mencattini | Andrea Santilli | Donato Crisostomi | Emanuele Rodolà
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Model merging allows combining the capabilities of existing models into a new one—post hoc, without additional training. This has made it increasingly popular thanks to its low cost and the availability of libraries that support merging on consumer GPUs. Recent work shows that pairing merging with evolutionary algorithms can boost performance, but no framework currently supports flexible experimentation with such strategies in language models. We introduce Mergenetic, an open-source library for evolutionary model merging. Mergenetic enables easy composition of merging methods and evolutionary algorithms, while incorporating lightweight fitness estimators to reduce evaluation costs. We describe its design and demonstrate that Mergenetic produces competitive results across tasks and languages using modest hardware. A video demo showcasing its main features is also provided.

2023

pdf bib
Accelerating Transformer Inference for Translation via Parallel Decoding
Andrea Santilli | Silvio Severino | Emilian Postolache | Valentino Maiorca | Michele Mancusi | Riccardo Marin | Emanuele Rodola
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network architectures and learning-based methods to solve this issue, which are expensive and require changes to the MT model, trading inference speed at the cost of the translation quality. In this paper, we propose to address the problem from the point of view of decoding algorithms, as a less explored but rather compelling direction. We propose to reframe the standard greedy autoregressive decoding of MT with a parallel formulation leveraging Jacobi and Gauss-Seidel fixed-point iteration methods for fast inference. This formulation allows to speed up existing models without training or modifications while retaining translation quality. We present three parallel decoding algorithms and test them on different languages and models showing how the parallelization introduces a speedup up to 38% w.r.t. the standard autoregressive decoding and nearly 2x when scaling the method on parallel resources. Finally, we introduce a decoding dependency graph visualizer (DDGviz) that let us see how the model has learned the conditional dependence between tokens and inspect the decoding procedure.

pdf bib
Camoscio: An Italian Instruction-tuned LLaMA
Andrea Santilli | Emanuele Rodolà
Proceedings of the 9th Italian Conference on Computational Linguistics (CLiC-it 2023)