Abstract
This paper outlines our approach to Tasks A & B for the English Language track of SemEval-2020 Task 12: OffensEval 2: Multilingual Offensive Language Identification in Social Media. We use a Linear SVM with document vectors computed from pre-trained word embeddings, and we explore the effectiveness of lexical, part of speech, dependency, and named entity (NE) features. We manually annotate a subset of the training data, which we use for error analysis and to tune a threshold for mapping training confidence values to labels. While document vectors are consistently the most informative features for both tasks, testing on the development set suggests that dependency features are an effective addition for Task A, and NE features for Task B.- Anthology ID:
- 2020.semeval-1.294
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Editors:
- Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 2209–2215
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.294
- DOI:
- 10.18653/v1/2020.semeval-1.294
- Cite (ACL):
- Jared Fromknecht and Alexis Palmer. 2020. UNT Linguistics at SemEval-2020 Task 12: Linear SVC with Pre-trained Word Embeddings as Document Vectors and Targeted Linguistic Features. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2209–2215, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- UNT Linguistics at SemEval-2020 Task 12: Linear SVC with Pre-trained Word Embeddings as Document Vectors and Targeted Linguistic Features (Fromknecht & Palmer, SemEval 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2020.semeval-1.294.pdf
- Data
- OLID