The topics of mass and count have been studied for many decades in philosophy (e.g., Quine, 1960; Pelletier, 1975), linguistics (e.g., McCawley, 1975; Allen, 1980; Krifka, 1991) and psychology (e.g., Middleton et al, 2004; Barner et al, 2009). More recently, interest from within computational linguistics has studied the issues involved (e.g., Pustejovsky, 1991; Bond, 2005; Schmidtke & Kuperman, 2016), to name just a few. As is pointed out in these works, there are many difficult conceptual issues involved in the study of this contrast. In this article we study one of these issues – the “Dual-Life” of being simultaneously +mass and +count – by means of an unusual combination of human annotation, online lexical resources, and online corpora.
The present paper describes the current release of the Bochum English Countability Lexicon (BECL 2.1), a large empirical database consisting of lemmata from Open ANC (http://www.anc.org) with added senses from WordNet (Fellbaum 1998). BECL 2.1 contains ≈ 11,800 annotated noun-sense pairs, divided in four major countability classes and 18 fine-grained subclasses. In the current version, BECL also provides information on nouns whose senses occur in more than one class allowing a closer look on polysemy and homonymy with regard to countability. Further included are sets of similar senses using the Leacock and Chodorow (LCH) score for semantic similarity (Leacock & Chodorow 1998), information on orthographic variation, on the completeness of all WordNet senses in the database and an annotated representation of different types of proper names. The further development of BECL will investigate the different countability classes of proper names and the general relation between semantic similarity and countability as well as recurring syntactic patterns for noun-sense pairs. The BECL 2.1 database is also publicly available via http://count-and-mass.org.
The present paper describes the construction of a resource to determine the lexical preference class of a large number of English noun-senses ($\approx$ 14,000) with respect to the distinction between mass and count interpretations. In constructing the lexicon, we have employed a questionnaire-based approach based on existing resources such as the Open ANC (\url{http://www.anc.org}) and WordNet \cite{Miller95}. The questionnaire requires annotators to answer six questions about a noun-sense pair. Depending on the answers, a given noun-sense pair can be assigned to fine-grained noun classes, spanning the area between count and mass. The reference lexicon contains almost 14,000 noun-sense pairs. An initial data set of 1,000 has been annotated together by four native speakers, while the remaining 12,800 noun-sense pairs have been annotated in parallel by two annotators each. We can confirm the general feasibility of the approach by reporting satisfactory values between 0.694 and 0.755 in inter-annotator agreement using Krippendorff’s $\alpha$.