MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data

Vageesh Kumar Saxena, Benjamin Ashpole, Gijs Van Dijck, Gerasimos Spanakis


Abstract
Human trafficking (HT) remains a critical issue, with traffickers increasingly leveraging online escort advertisements to advertise victims anonymously. Existing detection methods, including text-based Authorship Attribution (AA), overlook the multimodal nature of these ads, which combine text and images. To bridge this gap, we introduce MATCHED, a multimodal AA dataset comprising 27,619 unique text descriptions and 55,115 unique images sourced from Backpage across seven U.S. cities in four geographic regions. This study extensively benchmarks text-only, vision-only, and multimodal baselines for vendor identification and verification tasks, employing multitask (joint) training objectives that achieve superior classification and retrieval performance on in-sample and out-of-data distribution datasets. The results demonstrate that while text remains the dominant modality, integrating visual features adds stylistic cues that enrich model performance. Moreover, text-image alignment strategies like CLIP and BLIP2 struggle due to low semantic overlap and vague connections between the modalities of escort ads, with end-to-end multimodal training proving more robust. Our findings emphasize the potential of multimodal AA to combat HT, providing Law Enforcement Agencies with robust tools to link advertisements and disrupt trafficking networks.
Anthology ID:
2025.findings-acl.225
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4334–4373
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.225/
DOI:
10.18653/v1/2025.findings-acl.225
Bibkey:
Cite (ACL):
Vageesh Kumar Saxena, Benjamin Ashpole, Gijs Van Dijck, and Gerasimos Spanakis. 2025. MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4334–4373, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data (Saxena et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.225.pdf