Max Lübbering
2025
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
Mehdi Ali
|
Manuel Brack
|
Max Lübbering
|
Elias Wendt
|
Abbas Goher Khan
|
Richard Rutmann
|
Alex Jude
|
Maurice Kraus
|
Alexander Arno Weber
|
Felix Stollenwerk
|
David Kaczér
|
Florian Mai
|
Lucie Flek
|
Rafet Sifa
|
Nicolas Flores-Herr
|
Joachim Koehler
|
Patrick Schramowski
|
Michael Fromm
|
Kristian Kersting
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
2024
Tokenizer Choice For LLM Training: Negligible or Crucial?
Mehdi Ali
|
Michael Fromm
|
Klaudia Thellmann
|
Richard Rutmann
|
Max Lübbering
|
Johannes Leveling
|
Katrin Klug
|
Jan Ebert
|
Niclas Doll
|
Jasper Buschhoff
|
Charvi Jain
|
Alexander Weber
|
Lena Jurkschat
|
Hammam Abdelwahab
|
Chelsea John
|
Pedro Ortiz Suarez
|
Malte Ostendorff
|
Samuel Weinbach
|
Rafet Sifa
|
Stefan Kesselheim
|
Nicolas Flores-Herr
Findings of the Association for Computational Linguistics: NAACL 2024
Co-authors
- Mehdi Ali 2
- Nicolas Flores-Herr 2
- Michael Fromm 2
- Richard Rutmann 2
- Rafet Sifa 2
- show all...
- Hammam Abdelwahab 1
- Manuel Brack 1
- Jasper Buschhoff 1
- Niclas Doll 1
- Jan Ebert 1
- Lucie Flek 1
- Charvi Jain 1
- Chelsea John 1
- Alex Jude 1
- Lena Jurkschat 1
- David Kaczér 1
- Kristian Kersting 1
- Stefan Kesselheim 1
- Abbas Goher Khan 1
- Katrin Klug 1
- Maurice Kraus 1
- Joachim Köhler 1
- Johannes Leveling 1
- Florian Mai 1
- Pedro Ortiz Suarez 1
- Malte Ostendorff 1
- Patrick Schramowski 1
- Felix Stollenwerk 1
- Klaudia Thellmann 1
- Alexander Weber 1
- Alexander Arno Weber 1
- Samuel Weinbach 1
- Elias Wendt 1