Although studied for several decades, the syntactic properties of experiencer-object (EO) verbs are still under discussion, while most analyses are not supported by substantial corpus data. With GerEO, we intend to fill this lacuna for German EO-verbs by presenting a large-scale database of more than 10,000 examples for 64 verbs (up to 200 per verb) from a newspaper corpus annotated for several syntactic and semantic features relevant for their analysis, including the overall syntactic construction, the semantic stimulus type, and the form of a possible stimulus preposition, i.e. a preposition heading a PP that indicates (a part/aspect of) the stimulus. Non-psych occurrences of the verbs are not excluded from the database but marked as such to make a comparison possible. Data of this kind can be used to develop and test theoretical hypotheses on the properties of EO-verbs, aid in the construction of experiments as well as provide training and test data for AI systems.
The topics of mass and count have been studied for many decades in philosophy (e.g., Quine, 1960; Pelletier, 1975), linguistics (e.g., McCawley, 1975; Allen, 1980; Krifka, 1991) and psychology (e.g., Middleton et al, 2004; Barner et al, 2009). More recently, interest from within computational linguistics has studied the issues involved (e.g., Pustejovsky, 1991; Bond, 2005; Schmidtke & Kuperman, 2016), to name just a few. As is pointed out in these works, there are many difficult conceptual issues involved in the study of this contrast. In this article we study one of these issues – the “Dual-Life” of being simultaneously +mass and +count – by means of an unusual combination of human annotation, online lexical resources, and online corpora.
The present paper describes the current release of the Bochum English Countability Lexicon (BECL 2.1), a large empirical database consisting of lemmata from Open ANC (http://www.anc.org) with added senses from WordNet (Fellbaum 1998). BECL 2.1 contains ≈ 11,800 annotated noun-sense pairs, divided in four major countability classes and 18 fine-grained subclasses. In the current version, BECL also provides information on nouns whose senses occur in more than one class allowing a closer look on polysemy and homonymy with regard to countability. Further included are sets of similar senses using the Leacock and Chodorow (LCH) score for semantic similarity (Leacock & Chodorow 1998), information on orthographic variation, on the completeness of all WordNet senses in the database and an annotated representation of different types of proper names. The further development of BECL will investigate the different countability classes of proper names and the general relation between semantic similarity and countability as well as recurring syntactic patterns for noun-sense pairs. The BECL 2.1 database is also publicly available via http://count-and-mass.org.
The present paper describes the construction of a resource to determine the lexical preference class of a large number of English noun-senses ($\approx$ 14,000) with respect to the distinction between mass and count interpretations. In constructing the lexicon, we have employed a questionnaire-based approach based on existing resources such as the Open ANC (\url{http://www.anc.org}) and WordNet \cite{Miller95}. The questionnaire requires annotators to answer six questions about a noun-sense pair. Depending on the answers, a given noun-sense pair can be assigned to fine-grained noun classes, spanning the area between count and mass. The reference lexicon contains almost 14,000 noun-sense pairs. An initial data set of 1,000 has been annotated together by four native speakers, while the remaining 12,800 noun-sense pairs have been annotated in parallel by two annotators each. We can confirm the general feasibility of the approach by reporting satisfactory values between 0.694 and 0.755 in inter-annotator agreement using Krippendorff’s $\alpha$.