TextLens & LeTTuce: Automated Corpus Annotation and Multilingual Tagging as a Service

Cynthia Van Hee, Jonas Doumen, Vincent Prins, Pranaydeep Singh, Vincent Vandeghinste, Els Lefever


Abstract
We present TextLens, a web-based platform for automated linguistic annotation designed to lower technical barriers for researchers in digital humanities, linguistics and translation studies. Hosted by the Dutch Language Institute (INT), TextLens allows users to upload and annotate corpora in a variety of formats (.txt, .tsv, CoNLL-U, FoLiA, TEI, and NAF) using state-of-the-art NLP tools, without the need for local installation or computational resources. The platform supports multilingual data processing and provides a persistent dashboard for managing, monitoring and sharing annotation projects. Alongside this service, we introduce the LeTTuce-PoS Dataset, a new multilingual, manually annotated dataset for part-of-speech tagging in English, French, Dutch and German, covering multiple genres and offering a valuable resource to the research community. This paper also reports benchmark results for different PoS taggers (LeTs Preprocess, LeTTuce, spaCy and Stanza) on the dataset. Together, TextLens and the LeTTuce-PoS Dataset provide an accessible, scalable platform for high-quality annotation and a robust multilingual dataset that support comparable and reproducible research in multilingual contexts.
Anthology ID:
2026.lrec-main.906
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
11574–11584
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.906/
DOI:
Bibkey:
Cite (ACL):
Cynthia Van Hee, Jonas Doumen, Vincent Prins, Pranaydeep Singh, Vincent Vandeghinste, and Els Lefever. 2026. TextLens & LeTTuce: Automated Corpus Annotation and Multilingual Tagging as a Service. International Conference on Language Resources and Evaluation, main:11574–11584.
Cite (Informal):
TextLens & LeTTuce: Automated Corpus Annotation and Multilingual Tagging as a Service (Van Hee et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.906.pdf