Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking
Yifei Li, Pratheeksha Nair, Kellin Pelrine, Reihaneh Rabbany
Abstract
Online escort advertisement websites are widely used for advertising victims of human trafficking. Domain experts agree that advertising multiple people in the same ad is a strong indicator of trafficking. Thus, extracting person names from the text of these ads can provide valuable clues for further analysis. However, Named-Entity Recognition (NER) on escort ads is challenging because the text can be noisy, colloquial and often lacking proper grammar and punctuation. Most existing state-of-the-art NER models fail to demonstrate satisfactory performance in this task. In this paper, we propose NEAT (Name Extraction Against Trafficking) for extracting person names. It effectively combines classic rule-based and dictionary extractors with a contextualized language model to capture ambiguous names (e.g penny, hazel) and adapts to adversarial changes in the text by expanding its dictionary. NEAT shows 19% improvement on average in the F1 classification score for name extraction compared to previous state-of-the-art in two domain-specific datasets.- Anthology ID:
- 2022.findings-acl.225
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Smaranda Muresan, Preslav Nakov, Aline Villavicencio
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2854–2868
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.225
- DOI:
- 10.18653/v1/2022.findings-acl.225
- Cite (ACL):
- Yifei Li, Pratheeksha Nair, Kellin Pelrine, and Reihaneh Rabbany. 2022. Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2854–2868, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking (Li et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2022.findings-acl.225.pdf
- Data
- WNUT 2017