When the Dictionary Strikes Back: A Case Study on Slovak Migration Location Term Extraction and NER via Rule-Based vs. LLM Methods
Miroslav Blšták, Jaroslav Kopčan, Marek Suppa, Samuel Havran, Andrej Findor, Martin Takac, Marian Simko
Abstract
This study explores the task of automatically extracting migration-related locations (source and destination) from media articles, focusing on the challenges posed by Slovak, a low-resource and morphologically complex language. We present the first comparative analysis of rule-based dictionary approaches (NLP4SK) versus Large Language Models (LLMs, e.g. SlovakBERT, GPT-4o) for both geographical relevance classification (Slovakia-focused migration) and specific source/target location extraction. To facilitate this research and future work, we introduce the first manually annotated Slovak dataset tailored for migration-focused locality detection. Our results show that while a fine-tuned SlovakBERT model achieves high accuracy for classification, specialized rule-based methods still have the potential to outperform LLMs for specific extraction tasks, though improved LLM performance with few-shot examples suggests future competitiveness as research in this area continues to evolve.- Anthology ID:
- 2025.bsnlp-1.11
- Volume:
- Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Jakub Piskorski, Pavel Přibáň, Preslav Nakov, Roman Yangarber, Michal Marcinczuk
- Venues:
- BSNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 91–100
- Language:
- URL:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.11/
- DOI:
- Cite (ACL):
- Miroslav Blšták, Jaroslav Kopčan, Marek Suppa, Samuel Havran, Andrej Findor, Martin Takac, and Marian Simko. 2025. When the Dictionary Strikes Back: A Case Study on Slovak Migration Location Term Extraction and NER via Rule-Based vs. LLM Methods. In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 91–100, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- When the Dictionary Strikes Back: A Case Study on Slovak Migration Location Term Extraction and NER via Rule-Based vs. LLM Methods (Blšták et al., BSNLP 2025)
- PDF:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.11.pdf