Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia

Yiping Jin, Vishakha Kadam, Dittaya Wanvarie


Abstract
Contextual advertising provides advertisers with the opportunity to target the context which is most relevant to their ads. The large variety of potential topics makes it very challenging to collect training documents to build a supervised classification model or compose expert-written rules in a rule-based classification system. Besides, in fine-grained classification, different categories often overlap or co-occur, making it harder to classify accurately. In this work, we propose wiki2cat, a method to tackle large-scaled fine-grained text classification by tapping on the Wikipedia category graph. The categories in the IAB taxonomy are first mapped to category nodes in the graph. Then the label is propagated across the graph to obtain a list of labeled Wikipedia documents to induce text classifiers. The method is ideal for large-scale classification problems since it does not require any manually-labeled document or hand-curated rules or keywords. The proposed method is benchmarked with various learning-based and keyword-based baselines and yields competitive performance on publicly available datasets and a new dataset containing more than 300 fine-grained categories.
Anthology ID:
2021.textgraphs-1.1
Volume:
Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)
Month:
June
Year:
2021
Address:
Mexico City, Mexico
Editors:
Alexander Panchenko, Fragkiskos D. Malliaros, Varvara Logacheva, Abhik Jana, Dmitry Ustalov, Peter Jansen
Venue:
TextGraphs
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–9
Language:
URL:
https://aclanthology.org/2021.textgraphs-1.1
DOI:
10.18653/v1/2021.textgraphs-1.1
Bibkey:
Cite (ACL):
Yiping Jin, Vishakha Kadam, and Dittaya Wanvarie. 2021. Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia. In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15), pages 1–9, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia (Jin et al., TextGraphs 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2021.textgraphs-1.1.pdf
Code
 YipingNUS/contextual-eval-dataset