Comparison of String Similarity Measures for Obscenity Filtering

Ekaterina Chernyak

doi:10.18653/v1/W17-1415

Comparison of String Similarity Measures for Obscenity Filtering

[How to correct problems with metadata yourself]

Abstract

In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collection and a baseline for the task. Our experiments show that a novel string similarity measure based on the notion of an annotated suffix tree outperforms some of the other well known measures.

Anthology ID:: W17-1415
Volume:: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing
Month:: April
Year:: 2017
Address:: Valencia, Spain
Editors:: Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber
Venue:: BSNLP
SIG:: SIGSLAV
Publisher:: Association for Computational Linguistics
Note:
Pages:: 97–101
Language:
URL:: https://aclanthology.org/W17-1415
DOI:: 10.18653/v1/W17-1415
Bibkey:
Cite (ACL):: Ekaterina Chernyak. 2017. Comparison of String Similarity Measures for Obscenity Filtering. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 97–101, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):: Comparison of String Similarity Measures for Obscenity Filtering (Chernyak, BSNLP 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/teach-a-man-to-fish/W17-1415.pdf

PDF Search