Alberto Lumbreras
2024
LOCOST: State-Space Models for Long Document Abstractive Summarization
Florian Le Bronnec
|
Song Duong
|
Mathieu Ravaut
|
Alexandre Allauzen
|
Nancy Chen
|
Vincent Guigue
|
Alberto Lumbreras
|
Laure Soulier
|
Patrick Gallinari
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of 𝒪(L log L), this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.
2023
Evaluating the Generalization Property of Prefix-based Methods for Data-to-text Generation
Clarine Vongpaseut
|
Alberto Lumbreras
|
Mike Gartrell
|
Patrick Gallinari
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 2 : travaux de recherche originaux -- articles courts
Fine-tuning is the prevalent paradigm to adapt pre-trained language models to downstream tasks. Lightweight fine-tuning methods, such as prefix-tuning, only tune a small set of parameters which alleviates cost. Such methods were shown to achieve results similar to fine-tuning; however, performance can decrease when the inputs get farther from the training domain. Moreover, latest works questioned the efficiency of recent lightweight fine-tuning techniques depending on the task and the size of the model. In this paper, we propose to evaluate the generalization property of prefix-based methods depending on the size of the pre-trained language model in the multi-domain setting on data-to-text generation. We found that their performance depends heavily on the size of the model.
Search
Co-authors
- Patrick Gallinari 2
- Florian Le Bronnec 1
- Song Duong 1
- Mathieu Ravaut 1
- Alexandre Allauzen 1
- show all...