Samuel Havran
2025
When the Dictionary Strikes Back: A Case Study on Slovak Migration Location Term Extraction and NER via Rule-Based vs. LLM Methods
Miroslav Blšták
|
Jaroslav Kopčan
|
Marek Suppa
|
Samuel Havran
|
Andrej Findor
|
Martin Takac
|
Marian Simko
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
This study explores the task of automatically extracting migration-related locations (source and destination) from media articles, focusing on the challenges posed by Slovak, a low-resource and morphologically complex language. We present the first comparative analysis of rule-based dictionary approaches (NLP4SK) versus Large Language Models (LLMs, e.g. SlovakBERT, GPT-4o) for both geographical relevance classification (Slovakia-focused migration) and specific source/target location extraction. To facilitate this research and future work, we introduce the first manually annotated Slovak dataset tailored for migration-focused locality detection. Our results show that while a fine-tuned SlovakBERT model achieves high accuracy for classification, specialized rule-based methods still have the potential to outperform LLMs for specific extraction tasks, though improved LLM performance with few-shot examples suggests future competitiveness as research in this area continues to evolve.
Search
Fix author
Co-authors
- Miroslav Blšták 1
- Andrej Findor 1
- Jaroslav Kopčan 1
- Martin Takáč 1
- Marián Šimko 1
- show all...