Maksim Savkin
2025
SPY: Enhancing Privacy with Synthetic PII Detection Dataset
Maksim Savkin
|
Timur Ionov
|
Vasily Konovalov
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
We introduce **SPY Dataset**: a novel synthetic dataset for the task of **Personal Identifiable Information (PII) detection**, underscoring the significance of protecting PII in modern data processing. Our research innovates by leveraging Large Language Models (LLMs) to generate a dataset that emulates real-world PII scenarios. Through evaluation, we validate the dataset’s quality, providing a benchmark for PII detection. Comparative analyses reveal that while PII and Named Entity Recognition (NER) share similarities, **dedicated NER models exhibit limitations** when applied to PII-specific contexts. This work contributes to the field by making the generation methodology and the generated dataset publicly, thereby enabling further research and development in this field.
2024
DeepPavlov 1.0: Your Gateway to Advanced NLP Models Backed by Transformers and Transfer Learning
Maksim Savkin
|
Anastasia Voznyuk
|
Fedor Ignatov
|
Anna Korzanova
|
Dmitry Karpov
|
Alexander Popov
|
Vasily Konovalov
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present DeepPavlov 1.0, an open-source framework for using Natural Language Processing (NLP) models by leveraging transfer learning techniques. DeepPavlov 1.0 is created for modular and configuration-driven development of state-of-the-art NLP models and supports a wide range of NLP model applications. DeepPavlov 1.0 is designed for practitioners with limited knowledge of NLP/ML. DeepPavlov is based on PyTorch and supports HuggingFace transformers. DeepPavlov is publicly released under the Apache 2.0 license and provides access to an online demo.
Search
Fix data
Co-authors
- Vasily Konovalov 2
- Fedor Ignatov 1
- Timur Ionov 1
- Dmitry Karpov 1
- Anna Korzanova 1
- show all...