Martin Jaggi
2026
Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
Alejandro Hern\'andez-Cano | Alexander H\"agele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna P\'asztor | Bettina Messmer | Dhia Garbaya | Eduard Frank \v{D}urech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabol\v{c}ec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | In\'es Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian B\"osch | Maximilian B\"other | Niklas Canova | Camille Challier | Cl\'ement Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | Mar{\'\i}a Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike L\"ubeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendon\c{c}a | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | L\'eo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tram\`er | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Alejandro Hern\'andez-Cano | Alexander H\"agele | Allen Hao Huang | Angelika Romanou | Antoni-Joan Solergibert | Barna P\'asztor | Bettina Messmer | Dhia Garbaya | Eduard Frank \v{D}urech | Ido Hakimi | Juan Garcia Giraldo | Mete Ismayilzada | Negar Foroutan | Skander Moalla | Tiancheng Chen | Vinko Sabol\v{c}ec | Yixuan Xu | Michael Aerni | Badr AlKhamissi | In\'es Altemir Marinas | Mohammad Hossein Amani | Matin Ansaripour | Ilia Badanin | Harold Benoit | Emanuela Boros | Nicholas John Browning | Fabian B\"osch | Maximilian B\"other | Niklas Canova | Camille Challier | Cl\'ement Charmillot | Jonathan Coles | Jan Milan Deriu | Arnout Devos | Lukas Drescher | Daniil Dzenhaliou | Maud Ehrmann | Dongyang Fan | Simin Fan | Silin Gao | Miguel Gila | Mar{\'\i}a Grandury | Diba Hashemi | Alexander Miserlis Hoyle | Jiaming Jiang | Mark Klein | Andrei Kucharavy | Anastasiia Kucherenko | Frederike L\"ubeck | Roman Machacek | Theofilos Ioannis Manitaras | Andreas Marfurt | Kyle Matoba | Simon Matrenok | Henrique Mendon\c{c}a | Fawzi Roberto Mohamed | Syrielle Montariol | Luca Mouchel | Sven Najem-Meyer | Jingwei Ni | Gennaro Oliva | Matteo Pagliardini | Elia Palme | Andrei Panferov | L\'eo Paoletti | Marco Passerini | Ivan Pavlov | Auguste Poiroux | Kaustubh Ponkshe | Nathan Ranchin | Javier Rando | Mathieu Sauser | Jakhongir Saydaliev | Mukhammadali Sayfiddinov | Marian Schneider | Stefano Schuppli | Marco Scialanga | Andrei Semenov | Kumar Shridhar | Raghav Singhal | Anna Sotnikova | Alexander Sternfeld | Ayush Kumar Tarun | Paul Teiletche | Jannis Vamvas | Xiaozhe Yao | Hao Zhao | Alexander Ilic | Ana Klimovic | Andreas Krause | Caglar Gulcehre | David Rosenthal | Elliott Ash | Florian Tram\`er | Joost VandeVondele | Livio Veraldi | Martin Rajman | Thomas C. Schulthess | Torsten Hoefler | Antoine Bosselut | Martin Jaggi | Imanol Schlag
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Open LLMs enable AI practitioners to control development costs by building on an existing foundation for downstream applications. While offering substantial promise, current models often fail to meet the needs of users needing open solutions aligned with responsible AI principles, including data compliance, transparency, and inclusivity. In this work, we present Apertus, a fully open suite of large language models (LLMs) designed to address responsibility shortcomings in today’s open model ecosystem, namely data responsibility and global representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of data memorization, we also adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. Apertus also drastically expands multilingual coverage, training on 15T tokens from over approximately 1800 languages, with about 40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivaling or surpassing open-weight counterparts.
2025
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
Bettina Messmer | Vinko Sabolčec | Martin Jaggi
Proceedings of the 10th edition of the Swiss Text Analytics Conference
Bettina Messmer | Vinko Sabolčec | Martin Jaggi
Proceedings of the 10th edition of the Swiss Text Analytics Conference
2023
SIMSUM: Document-level Text Simplification via Simultaneous Summarization
Sofia Blinova | Xinyu Zhou | Martin Jaggi | Carsten Eickhoff | Seyed Ali Bahrainian
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sofia Blinova | Xinyu Zhou | Martin Jaggi | Carsten Eickhoff | Seyed Ali Bahrainian
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Document-level text simplification is a specific type of simplification which involves simplifying documents consisting of several sentences by rewriting them into fewer or more sentences. In this paper, we propose a new two-stage framework SIMSUM for automated document-level text simplification. Our model is designed with explicit summarization and simplification models and guides the generation using the main keywords of a source text. In order to evaluate our new model, we use two existing benchmark datasets for simplification, namely D-Wikipedia and Wiki-Doc. We compare our model’s performance with state of the art and show that SIMSUM achieves top results on the D-Wikipedia dataset SARI (+1.20), D-SARI (+1.64), and FKGL (-0.35) scores, improving over the best baseline models. In order to evaluate the quality of the generated text, we analyze the outputs from different models qualitatively and demonstrate the merit of our new model. Our code and datasets are available.
2022
SKILL: Structured Knowledge Infusion for Large Language Models
Fedor Moiseev | Zhe Dong | Enrique Alfonseca | Martin Jaggi
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Fedor Moiseev | Zhe Dong | Enrique Alfonseca | Martin Jaggi
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Large language models (LLMs) have demonstrated human-level performance on a vast spectrum of natural language tasks. However, it is largely unexplored whether they can better internalize knowledge from a structured data, such as a knowledge graph, or from text. In this work, we propose a method to infuse structured knowledge into LLMs, by directly training T5 models on factual triples of knowledge graphs (KGs). We show that models pre-trained on Wikidata KG with our method outperform the T5 baselines on FreebaseQA and WikiHop, as well as the Wikidata-answerable subset of TriviaQA and NaturalQuestions. The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge. Trained on a smaller size KG, WikiMovies, we saw 3x improvement of exact match score on MetaQA task. The proposed method has an advantage that no alignment between the knowledge graph and text corpus is required in curating training data. This makes our method particularly useful when working with industry-scale knowledge graphs.
2021
Obtaining Better Static Word Embeddings Using Contextual Embedding Models
Prakhar Gupta | Martin Jaggi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Prakhar Gupta | Martin Jaggi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
The advent of contextual word embeddings — representations of words which incorporate semantic and syntactic information from their context—has led to tremendous improvements on a wide variety of NLP tasks. However, recent contextual models have prohibitively high computational cost in many use-cases and are often hard to interpret. In this work, we demonstrate that our proposed distillation method, which is a simple extension of CBOW-based training, allows to significantly improve computational efficiency of NLP applications, while outperforming the quality of existing static embeddings trained from scratch as well as those distilled from previously proposed methods. As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings via standard lexical evaluation tasks.
Self-Supervised Neural Topic Modeling
Seyed Ali Bahrainian | Martin Jaggi | Carsten Eickhoff
Findings of the Association for Computational Linguistics: EMNLP 2021
Seyed Ali Bahrainian | Martin Jaggi | Carsten Eickhoff
Findings of the Association for Computational Linguistics: EMNLP 2021
Topic models are useful tools for analyzing and interpreting the main underlying themes of large corpora of text. Most topic models rely on word co-occurrence for computing a topic, i.e., a weighted set of words that together represent a high-level semantic concept. In this paper, we propose a new light-weight Self-Supervised Neural Topic Model (SNTM) that learns a rich context by learning a topic representation jointly from three co-occurring words and a document that the triple originates from. Our experimental results indicate that our proposed neural topic model, SNTM, outperforms previously existing topic models in coherence metrics as well as document clustering accuracy. Moreover, apart from the topic coherence and clustering performance, the proposed neural topic model has a number of advantages, namely, being computationally efficient and easy to train.
Lightweight Cross-Lingual Sentence Representation Learning
Zhuoyuan Mao | Prakhar Gupta | Chenhui Chu | Martin Jaggi | Sadao Kurohashi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Zhuoyuan Mao | Prakhar Gupta | Chenhui Chu | Martin Jaggi | Sadao Kurohashi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks. However, further increases and modifications based on such large-scale models are usually impractical due to memory limitations. In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We explore different training tasks and observe that current cross-lingual training tasks leave a lot to be desired for this shallow architecture. To ameliorate this, we propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task. We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which compensates for the learning bottleneck of the lightweight transformer for generative tasks. Our comparisons with competing models on cross-lingual sentence retrieval and multilingual document classification confirm the effectiveness of the newly proposed training tasks for a shallow model.
2020
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Mengjie Zhao | Tao Lin | Fei Mi | Martin Jaggi | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Mengjie Zhao | Tao Lin | Fei Mi | Martin Jaggi | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT, RoBERTa, and DistilBERT on eleven diverse NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred. Intrinsic evaluations show that representations computed by our binary masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.
2019
Better Word Embeddings by Disentangling Contextual n-Gram Information
Prakhar Gupta | Matteo Pagliardini | Martin Jaggi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Prakhar Gupta | Matteo Pagliardini | Martin Jaggi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Pre-trained word vectors are ubiquitous in Natural Language Processing applications. In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings. We claim that training word embeddings along with higher n-gram embeddings helps in the removal of the contextual information from the unigrams, resulting in better stand-alone word embeddings. We empirically show the validity of our hypothesis by outperforming other competing word representation models by a significant margin on a wide variety of tasks. We make our models publicly available.
Correlating Twitter Language with Community-Level Health Outcomes
Arno Schneuwly | Ralf Grubenmann | Séverine Rion Logean | Mark Cieliebak | Martin Jaggi
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
Arno Schneuwly | Ralf Grubenmann | Séverine Rion Logean | Mark Cieliebak | Martin Jaggi
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
We study how language on social media is linked to mortal diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.
2018
Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features
Matteo Pagliardini | Prakhar Gupta | Martin Jaggi
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Matteo Pagliardini | Prakhar Gupta | Martin Jaggi
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.
Simple Unsupervised Keyphrase Extraction using Sentence Embeddings
Kamil Bennani-Smires | Claudiu Musat | Andreea Hossmann | Michael Baeriswyl | Martin Jaggi
Proceedings of the 22nd Conference on Computational Natural Language Learning
Kamil Bennani-Smires | Claudiu Musat | Andreea Hossmann | Michael Baeriswyl | Martin Jaggi
Proceedings of the 22nd Conference on Computational Natural Language Learning
Keyphrase extraction is the task of automatically selecting a small set of phrases that best describe a given free text document. Supervised keyphrase extraction requires large amounts of labeled training data and generalizes very poorly outside the domain of the training data. At the same time, unsupervised systems have poor accuracy, and often do not generalize well, as they require the input document to belong to a larger corpus also given as input. Addressing these drawbacks, in this paper, we tackle keyphrase extraction from single documents with EmbedRank: a novel unsupervised method, that leverages sentence embeddings. EmbedRank achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data. With EmbedRank, we also explicitly increase coverage and diversity among the selected keyphrases by introducing an embedding-based maximal marginal relevance (MMR) for new phrases. A user study including over 200 votes showed that, although reducing the phrases’ semantic overlap leads to no gains in F-score, our high diversity selection is preferred by humans.
2017
Generating Steganographic Text with LSTMs
Tina Fang | Martin Jaggi | Katerina Argyraki
Proceedings of ACL 2017, Student Research Workshop
Tina Fang | Martin Jaggi | Katerina Argyraki
Proceedings of ACL 2017, Student Research Workshop
2016
SwissCheese at SemEval-2016 Task 4: Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision
Jan Deriu | Maurice Gonzenbach | Fatih Uzdilli | Aurelien Lucchi | Valeria De Luca | Martin Jaggi
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
Jan Deriu | Maurice Gonzenbach | Fatih Uzdilli | Aurelien Lucchi | Valeria De Luca | Martin Jaggi
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
2015
Swiss-Chocolate: Combining Flipout Regularization and Random Forests with Artificially Built Subsystems to Boost Text-Classification for Sentiment
Fatih Uzdilli | Martin Jaggi | Dominic Egger | Pascal Julmy | Leon Derczynski | Mark Cieliebak
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
Fatih Uzdilli | Martin Jaggi | Dominic Egger | Pascal Julmy | Leon Derczynski | Mark Cieliebak
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
2014
Search
Fix author
Co-authors
- Prakhar Gupta 4
- Mark Cieliebak 3
- Matteo Pagliardini 3
- Fatih Uzdilli 3
- Seyed Ali Bahrainian 2
- Jan Milan Deriu 2
- Carsten Eickhoff 2
- Bettina Messmer 2
- Michael Aerni 1
- Badr AlKhamissi 1
- Enrique Alfonseca 1
- Mohammad Hossein Amani 1
- Matin Ansaripour 1
- Katerina Argyraki 1
- Elliott Ash 1
- Fabian B\"osch 1
- Maximilian B\"other 1
- Ilia Badanin 1
- Michael Baeriswyl 1
- Kamil Bennani-Smires 1
- Harold Benoit 1
- Sofia Blinova 1
- Emanuela Boroş 1
- Antoine Bosselut 1
- Nicholas John Browning 1
- Niklas Canova 1
- Camille Challier 1
- Cl\'ement Charmillot 1
- Tiancheng Chen 1
- Chenhui Chu 1
- Jonathan Coles 1
- Valeria De Luca 1
- Leon Derczynski 1
- Arnout Devos 1
- Zhe Dong 1
- Lukas Drescher 1
- Daniil Dzenhaliou 1
- Dominic Egger 1
- Maud Ehrmann 1
- Dongyang Fan 1
- Simin Fan 1
- Tina Fang 1
- Negar Foroutan 1
- Silin Gao 1
- Dhia Garbaya 1
- Miguel Gila 1
- Juan Garcia Giraldo 1
- Maurice Gonzenbach 1
- María Grandury 1
- Ralf Grubenmann 1
- Çağlar Gu̇lçehre 1
- Alexander H\"agele 1
- Ido Hakimi 1
- Diba Hashemi 1
- Alejandro Hern\'andez-Cano 1
- Torsten Hoefler 1
- Andreea Hossmann 1
- Alexander Miserlis Hoyle 1
- Allen Hao Huang 1
- Alexander Ilic 1
- Mete Ismayilzada 1
- Jiaming Jiang 1
- Pascal Julmy 1
- Mark Klein 1
- Ana Klimovic 1
- Andreas Krause 1
- Andrei Kucharavy 1
- Anastasiia Kucherenko 1
- Sadao Kurohashi 1
- Frederike L\"ubeck 1
- Tao Lin 1
- Aurelien Lucchi 1
- Roman Machacek 1
- Theofilos Ioannis Manitaras 1
- Zhuoyuan Mao 1
- Andreas Marfurt 1
- In\'es Altemir Marinas 1
- Kyle Matoba 1
- Simon Matrenok 1
- Henrique Mendon\c{c}a 1
- Fei Mi 1
- Skander Moalla 1
- Fawzi Roberto Mohamed 1
- Fedor Moiseev 1
- Syrielle Montariol 1
- Luca Mouchel 1
- Claudiu Musat 1
- Sven Najem-Meyer 1
- Jingwei Ni 1
- Gennaro Oliva 1
- Barna P\'asztor 1
- Elia Palme 1
- Andrei Panferov 1
- L\'eo Paoletti 1
- Marco Passerini 1
- Ivan Pavlov 1
- Auguste Poiroux 1
- Kaustubh Ponkshe 1
- Martin Rajman 1
- Nathan Ranchin 1
- Javier Rando 1
- Séverine Rion Logean 1
- Angelika Romanou 1
- David Rosenthal 1
- Vinko Sabol\v{c}ec 1
- Vinko Sabolčec 1
- Mathieu Sauser 1
- Jakhongir Saydaliev 1
- Mukhammadali Sayfiddinov 1
- Imanol Schlag 1
- Marian Schneider 1
- Arno Schneuwly 1
- Thomas C. Schulthess 1
- Stefano Schuppli 1
- Hinrich Schütze 1
- Marco Scialanga 1
- Andrei Semenov 1
- Kumar Shridhar 1
- Raghav Singhal 1
- Antoni-Joan Solergibert 1
- Anna Sotnikova 1
- Alexander Sternfeld 1
- Ayush Kumar Tarun 1
- Paul Teiletche 1
- Florian Tram\`er 1
- Jannis Vamvas 1
- Joost VandeVondele 1
- Livio Veraldi 1
- Yixuan Xu 1
- Xiaozhe Yao 1
- Mengjie Zhao 1
- Hao Zhao 1
- Xinyu Zhou 1
- Eduard Frank \v{D}urech 1