Jaromir Savelka
Also published as: Jaromír Šavelka
2026
Semantic Span Annotation: An Exploratory Study of LLM Annotation
Tejas Goyal | Dhriti Krishnan | Anuj Gupta | Jaromir Savelka
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Tejas Goyal | Dhriti Krishnan | Anuj Gupta | Jaromir Savelka
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Structured span extraction research is siloed by context length, annotation task, and domain, making it difficult to assess how well large language models (LLMs) generalize across realistic extraction settings. We introduce SSA (Structured Span Annotation), a unified evaluation framework bringing together five datasets across four domains: finance, biomedicine, affective analysis, and privacy, under a common JSONL format with character-level offsets. We conduct an exploratory study evaluating seven models (three closed, four open-weight) under three prompting configurations: zero-shot, definition-augmented, and few-shot, formulating extraction as inline XML generation where models reproduce the document with tagged spans. Our results reveal two distinct performance regimes: on tasks requiring complex ontology reasoning, zero-shot performance is near zero (e.g., 0.00% F1 on FiNER-139) but improves substantially with label definitions (e.g., Claude Opus 4.6 rises from 8.8% to 57.5% F1); on pattern-based tasks like PII detection, definitions consistently hurt performance across all models. These findings suggest that prompting strategy must be matched to task structure, and that unified evaluation frameworks spanning varied domains and input lengths are essential for understanding LLM extraction capabilities.
2021
Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models
Jaromir Savelka | Kevin Ashley
Findings of the Association for Computational Linguistics: EMNLP 2021
Jaromir Savelka | Kevin Ashley
Findings of the Association for Computational Linguistics: EMNLP 2021
Legal texts routinely use concepts that are difficult to understand. Lawyers elaborate on the meaning of such concepts by, among other things, carefully investigating how they have been used in the past. Finding text snippets that mention a particular concept in a useful way is tedious, time-consuming, and hence expensive. We assembled a data set of 26,959 sentences, coming from legal case decisions, and labeled them in terms of their usefulness for explaining selected legal concepts. Using the dataset we study the effectiveness of transformer models pre-trained on large language corpora to detect which of the sentences are useful. In light of models’ predictions, we analyze various linguistic properties of the explanatory sentences as well as their relationship to the legal concept that needs to be explained. We show that the transformer-based models are capable of learning surprisingly sophisticated features and outperform the prior approaches to the task.
2020
ECHR: Legal Corpus for Argument Mining
Prakash Poudyal | Jaromir Savelka | Aagje Ieven | Marie Francine Moens | Teresa Goncalves | Paulo Quaresma
Proceedings of the 7th Workshop on Argument Mining
Prakash Poudyal | Jaromir Savelka | Aagje Ieven | Marie Francine Moens | Teresa Goncalves | Paulo Quaresma
Proceedings of the 7th Workshop on Argument Mining
In this paper, we publicly release an annotated corpus of 42 decisions of the European Court of Human Rights (ECHR). The corpus is annotated in terms of three types of clauses useful in argument mining: premise, conclusion, and non-argument parts of the text. Furthermore, relationships among the premises and conclusions are mapped. We present baselines for three tasks that lead from unstructured texts to structured arguments. The tasks are argument clause recognition, clause relation prediction, and premise/conclusion recognition. Despite a straightforward application of the bidirectional encoders from Transformers (BERT), we obtained very promising results F1 0.765 on argument recognition, 0.511 on relation prediction, and 0.859/0.628 on premise/conclusion recognition). The results suggest the usefulness of pre-trained language models based on deep neural network architectures in argument mining. Because of the simplicity of the baselines, there is ample space for improvement in future work based on the released corpus.
2017
Sentence Boundary Detection in Adjudicatory Decisions in the United States
Jaromir Savelka | Vern R. Walker | Matthias Grabmair | Kevin D. Ashley
Traitement Automatique des Langues, Volume 58, Numéro 2 : Traitement automatique de la langue juridique [Legal Natural Language Processing]
Jaromir Savelka | Vern R. Walker | Matthias Grabmair | Kevin D. Ashley
Traitement Automatique des Langues, Volume 58, Numéro 2 : Traitement automatique de la langue juridique [Legal Natural Language Processing]