This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
DirkGeeraerts
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Semantic similarity is a key issue in many computational tasks. This paper goes into the development and evaluation of two common ways of automatically calculating the semantic similarity between two words. On the one hand, such methods may depend on a manually constructed thesaurus like (Euro)WordNet. Their performance is often evaluated on the basis of a very restricted set of human similarity ratings. On the other hand, corpus-based methods rely on the distribution of two words in a corpus to determine their similarity. Their performance is generally quantified through a comparison with the judgements of the first type of approach. This paper introduces a new Gold Standard of more than 5,000 human intra-category similarity judgements. We show that corpus-based methods often outperform (Euro)WordNet on this data set, and that the use of the latter as a Gold Standard for the former, is thus often far from ideal.
Vector-based models of lexical semantics retrieve semantically related words automatically from large corpora by exploiting the property that words with a similar meaning tend to occur in similar contexts. Despite their increasing popularity, it is unclear which kind of semantic similarity they actually capture and for which kind of words. In this paper, we use three vector-based models to retrieve semantically related words for a set of Dutch nouns and we analyse whether three linguistic properties of the nouns influence the results. In particular, we compare results from a dependency-based model with those from a 1st and 2nd order bag-of-words model and we examine the effect of the nouns frequency, semantic speficity and semantic class. We find that all three models find more synonyms for high-frequency nouns and those belonging to abstract semantic classses. Semantic specificty does not have a clear influence.
Cet article présente la méthodologie et les résultats d’une analyse sémantique quantitative d’environ 5000 spécificités dans le domaine technique des machines-outils pour l’usinage des métaux. Les spécificités seront identifiées avec la méthode des mots-clés (KeyWords Method). Ensuite, elles seront soumises à une analyse sémantique quantitative, à partir du recouvrement des cooccurrences des cooccurrences, permettant de déterminer le degré de monosémie des spécificités. Finalement, les données quantitatives de spécificité et de monosémie feront l’objet d’analyses de régression. Nous avançons l’hypothèse que les mots (les plus) spécifiques du corpus technique ne sont pas (les plus) monosémiques. Nous présenterons ici les résultats statistiques, ainsi qu’une interprétation linguistique. Le but de cette étude est donc de vérifier si et dans quelle mesure les spécificités du corpus technique sont monosémiques ou polysémiques et quels sont les facteurs déterminants.