Robust Text Classification for Sparsely Labelled Data Using Multi-level Embeddings

Simon Baker, Douwe Kiela, Anna Korhonen


Abstract
The conventional solution for handling sparsely labelled data is extensive feature engineering. This is time consuming and task and domain specific. We present a novel approach for learning embedded features that aims to alleviate this problem. Our approach jointly learns embeddings at different levels of granularity (word, sentence and document) along with the class labels. The intuition is that topic semantics represented by embeddings at multiple levels results in better classification. We evaluate this approach in unsupervised and semi-supervised settings on two sparsely labelled classification tasks, outperforming the handcrafted models and several embedding baselines.
Anthology ID:
C16-1220
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
2333–2343
Language:
URL:
https://aclanthology.org/C16-1220
DOI:
Bibkey:
Cite (ACL):
Simon Baker, Douwe Kiela, and Anna Korhonen. 2016. Robust Text Classification for Sparsely Labelled Data Using Multi-level Embeddings. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2333–2343, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Robust Text Classification for Sparsely Labelled Data Using Multi-level Embeddings (Baker et al., COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/C16-1220.pdf
Data
HOC