Text Quantification

Fabrizio Sebastiani

Text Quantification

[How to correct problems with metadata yourself]

Abstract

In recent years it has been pointed out that, in a number of applications involving (text) classification, the final goal is not determining which class (or classes) individual unlabelled data items belong to, but determining the prevalence (or "relative frequency") of each class in the unlabelled data. The latter task is known as quantification. Assume a market research agency runs a poll in which they ask the question "What do you think of the recent ad campaign for product X?" Once the poll is complete, they may want to classify the resulting textual answers according to whether they belong or not to the class LovedTheCampaign. The agency is likely not interested in whether a specific individual belongs to the class LovedTheCampaign, but in knowing how many respondents belong to it, i.e., in knowing the prevalence of the class. In other words, the agency is interested not in classification, but in quantification. Essentially, quantification is classification tackled at the aggregate (rather than at the individual) level. The research community has recently shown a growing interest in tackling quantification as a task in its own right. One of the reasons is that, since the goal of quantification is different than that of classification, quantification requires evaluation measures different than for classification. A second, related reason is that using a method optimized for classification accuracy is suboptimal when quantification accuracy is the real goal. A third reason is the growing awareness that quantification is going to be more and more important; with the advent of big data, more and more application contexts are going to spring up in which we will simply be happy with analyzing data at the aggregate (rather than at the individual) level. The goal of this tutorial is to introduce the audience to the problem of quantification, to the techniques that have been proposed for solving it, to the metrics used to evaluate them, and to the problems that are still open in the area.

Anthology ID:: D14-2008
Volume:: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Month:: October
Year:: 2014
Address:: Doha, Qatar
Editors:: Lucia Specia, Xavier Carreras
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:
Language:
URL:: https://aclanthology.org/D14-2008
DOI:
Bibkey:
Cite (ACL):: Fabrizio Sebastiani. 2014. Text Quantification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Doha, Qatar. Association for Computational Linguistics.
Cite (Informal):: Text Quantification (Sebastiani, EMNLP 2014)
Copy Citation: