Michael Zock

Language production is largely a matter of words which, in the case of access problems, can be searched for in an external resource (lexicon, thesaurus). In this kind of dialogue the user provides the momentarily available knowledge concerning the target and the system responds with the best guess(es) it can make given this input. As tip-of-the-tongue (ToT)-studies have shown, people always have some knowledge concerning the target (meaning fragments, number of syllables, ...) even if its complete form is eluding them. We will show here how to tap on this knowledge to build a resource likely to help authors (speakers/writers) to overcome the ToT-problem. Yet, before doing so we need a better understanding of the various kinds of knowledge people have when looking for a word. To this end, we asked crowdworkers to provide some cues to describe a given target and to specify then how each one of them relates to the target, in the hope that this could help others to find the elusive word. Next, we checked how well a given search strategy worked when being applied to differently built lexical networks. The results showed quite dramatic differences, which is not really surprising. After all, different networks are built for different purposes; hence each one of them is more or less suited for a given task. What was more surprising though is the fact that the relational information given by the users did not allow us to find the elusive word in WordNet better than without it.

pdf abs
WordNet and beyond: the case of lexical access
Michael Zock | Didier Schwab
Proceedings of the 8th Global WordNet Conference (GWC)

For humans the main functions of a dictionary is to store information concerning words and to reveal it when needed. While readers are interested in the meaning of words, writers look for answers concerning usage, spelling, grammar or word forms (lemma). We will focus here on this latter task: help authors to find the word they are looking for, word they may know but whose form is eluding them. Put differently, we try to build a resource helping authors to overcome the tip-of-the-tongue problem (ToT). Obviously, in order to access a word, it must be stored somewhere (brain, resource). Yet this is by no means sufficient. We will illustrate this here by comparing WordNet (WN) to an equivalent lexical resource bootstrapped from Wikipedia (WiPi). Both may contain a given word, but ease and success of access may be different depending on other factors like quality of the query, proximity, type of connections, etc. Next we will show under what conditions WN is suitable for word access, and finally we will present a roadmap showing the obstacles to be overcome to build a resource allowing the text producer to find the word s/he is looking for.

2014

pdf abs
A Graph-Based Approach for Computing Free Word Associations
Gemma Bel Enguix | Reinhard Rapp | Michael Zock
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

A graph-based algorithm is used to analyze the co-occurrences of words in the British National Corpus. It is shown that the statistical regularities detected can be exploited to predict human word associations. The corpus-derived associations are evaluated using a large test set comprising several thousand stimulus/response pairs as collected from humans. The finding is that there is a high agreement between the two types of data. The considerable size of the test set allows us to split the stimulus words into a number of classes relating to particular word properties. For example, we construct six saliency classes, and for the words in each of these classes we compare the simulation results with the human data. It turns out that for each class there is a close relationship between the performance of our system and human performance. This is also the case for classes based on two other properties of words, namely syntactic and semantic word ambiguity. We interpret these findings as evidence for the claim that human association acquisition must be based on the statistical analysis of perceived language and that when producing associations the detected statistical regularities are replicated.

pdf
How well can a corpus-derived co-occurrence network simulate human associative behavior?
Gemma Bel Enguix | Reinhard Rapp | Michael Zock
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)

pdf bib
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)
Michael Zock | Reinhard Rapp | Chu-Ren Huang
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
The CogALex-IV Shared Task on the Lexical Access Problem
Reinhard Rapp | Michael Zock
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf
Wordfinding Problems and How to Overcome them Ultimately With the Help of a Computer
Michael Zock
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
TALN-RECITAL 2014 Workshop RLTLN 2014 : Réseaux Lexicaux pour le TAL (RLTLN 2014 : Lexical Networks for NLP)
Michael Zock | Gemma Bel-Enguix | Reinhard Rapp
TALN-RECITAL 2014 Workshop RLTLN 2014 : Réseaux Lexicaux pour le TAL (RLTLN 2014 : Lexical Networks for NLP)

pdf
Word storage does not guarantee accessibility (Stocker des Mots ne Garantit nullement leur Accès) [in French]
Michael Zock | Didier Schwab
TALN-RECITAL 2014 Workshop RLTLN 2014 : Réseaux Lexicaux pour le TAL (RLTLN 2014 : Lexical Networks for NLP)

2013

pdf
A Generic Cognitively Motivated Web-Environment to Help People to Become Quickly Fluent in a New Language
Michael Zock | Guy Lapalme | Lih-Juang Fang
PACLIC 27 Workshop on Computer-Assisted Language Learning

pdf
Lexical access via a simple co-occurrence network (Trouver les mots dans un simple réseau de co-occurrences) [in French]
Gemma Bel-Enguix | Michael Zock
Proceedings of TALN 2013 (Volume 2: Short Papers)

2012

pdf bib
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon
Michael Zock | Reinhard Rapp
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

pdf
Automatic index creation to support navigation in lexical graphs encoding part_of relations
Michael Zock | Debela Tesfaye
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

2011

pdf bib abs
Patrons de phrase, raccourcis pour apprendre rapidement à parler une nouvelle langue (Sentence patterns, shortcuts to quickly learn to speak a new language)
Michael Zock | Guy Lapalme
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous décrivons la création d’un environnement web pour aider des apprenants (adolescents ou adultes) à acquérir les automatismes nécessaires pour produire à un débit “normal” les structures fondamentales d’une langue. Notre point de départ est une base de données de phrases, glanées sur le web ou issues de livres scolaires ou de livres de phrases. Ces phrases ont été généralisées (remplacement de mots par des variables) et indexées en termes de buts pour former une arborescence de patrons. Ces deux astuces permettent de motiver l’usage des patrons et de crééer des phrases structurellement identiques à celles rencontrées, tout en étant sémantiquement différentes. Si les notions de ‘patrons’ ou de ‘phrases à trou implicitement typées’ ne sont pas nouvelles, le fait de les avoir portées sur ordinateur pour apprendre des langues l’est. Le système étant conçu pour être ouvert, il permet aux utilisateurs, concepteurs ou apprenants, des changements sur de nombreux points importants : le nom des variables, leurs valeurs, le laps de temps entre une question et sa réponse, etc. La version initiale a été développée pour l’anglais et le japonais. Pour tester la généricité de notre approche nous y avons ajouté relativement facilement le français et le chinois.

pdf abs
Évaluation et consolidation d’un réseau lexical via un outil pour retrouver le mot sur le bout de la langue (Evaluation and consolidation of a lexical network via a tool to find the word on the tip of the tongue)
Alain Joubert | Mathieu Lafourcade | Didier Schwab | Michael Zock
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Depuis septembre 2007, un réseau lexical de grande taille pour le Français est en cours de construction à l’aide de méthodes fondées sur des formes de consensus populaire obtenu via des jeux (projet JeuxDeMots). L’intervention d’experts humains est marginale en ce qu’elle représente moins de 0,5% des relations du réseau et se limite à des corrections, à des ajustements ainsi qu’à la validation des sens de termes. Pour évaluer la qualité de cette ressource construite par des participants de jeu (utilisateurs non experts) nous adoptons une démarche similaire à celle de sa construction, à savoir, la ressource doit être validée sur un vocabulaire de classe ouverte, par des non-experts, de façon stable (persistante dans le temps). Pour ce faire, nous proposons de vérifier si notre ressource est capable de servir de support à la résolution du problème nommé ‘Mot sur le Bout de la Langue’ (MBL). A l’instar de JeuxdeMots, l’outil développé peut être vu comme un jeu en ligne. Tout comme ce dernier, il permet d’acquérir de nouvelles relations, constituant ainsi un enrichissement de notre réseau lexical.

2010

pdf abs
Du TAL au TIL
Michael Zock | Guy Lapalme
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Historiquement deux types de traitement de la langue ont été étudiés: le traitement par le cerveau (approche psycholinguistique) et le traitement par la machine (approche TAL). Nous pensons qu’il y a place pour un troisième type: le traitement interactif de la langue (TIL), l’ordinateur assistant le cerveau. Ceci correspond à un besoin réel dans la mesure où les gens n’ont souvent que des connaissances partielles par rapport au problème à résoudre. Le but du TIL est de construire des ponts entre ces connaissances momentanées d’un utilisateur et la solution recherchée. À l’aide de quelques exemples, nous essayons de montrer que ceci est non seulement faisable et souhaitable, mais également d’un coût très raisonnable.

pdf bib
Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon
Michael Zock | Reinhard Rapp
Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon

pdf
Lexical Access, a Search-Problem
Michael Zock | Didier Schwab | Nirina Rakotonanahary
Proceedings of the 2nd Workshop on Cognitive Aspects of the Lexicon

pdf
Utilizing Citations of Foreign Words in Corpus-Based Dictionary Generation
Reinhard Rapp | Michael Zock
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)

pdf
The Noisier the Better: Identifying Multilingual Word Translations Using a Single Monolingual Corpus
Reinhard Rapp | Michael Zock | Andrew Trotman | Yue Xu
Proceedings of the 4th Workshop on Cross Lingual Information Access

pdf abs
A Tool for Linking Stems and Conceptual Fragments to Enhance word Access
Nuria Gala | Véronique Rey | Michael Zock
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Electronic dictionaries offer many possibilities unavailable in paper dictionaries to view, display or access information. However, even these resources fall short when it comes to access words sharing semantic features and certain aspects of form: few applications offer the possibility to access a word via a morphologically or semantically related word. In this paper, we present such an application, Polymots, a lexical database for contemporary French containing 20.000 words grouped in 2.000 families. The purpose of this resource is to group words into families on the basis of shared morpho-phonological and semantic information. Words with a common stem form a family; words in a family also share a set of common conceptual fragments (in some families there is a continuity of meaning, in others meaning is distributed). With this approach, we capitalize on the bidirectional link between semantics and morpho-phonology : the user can thus access words not only on the basis of ideas, but also on the basis of formal characteristics of the word, i.e. its morphological features. The resulting lexical database should help people learn French vocabulary and assist them to find words they are looking for, going thus beyond other existing lexical resources.

2008

pdf bib
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)
Michael Zock | Chu-Ren Huang
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)

pdf bib
Lexical access based on underspecified input
Michael Zock | Didier Schwab
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)

pdf
Looking up phrase rephrasings via a pivot language
Aurélien Max | Michael Zock
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)

pdf abs
How to Evaluate and Raise the Quality in a Collaborative Lexicographic Approach
Dan Cristea | Corina Forăscu | Marius Răschip | Michael Zock
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper focuses on different aspects of collaborative work used to create the electronic version of a dictionary in paper format, edited and printed by the Romanian Academy during the last century. In order to ensure accuracy in a reasonable amount of time, collaborative proofreading of the scanned material, through an on-line interface has been initiated. The paper details the activities and the heuristics used to maximize accuracy, and to evaluate the work of anonymous contributors with diverse backgrounds. Observing the behaviour of the enterprise for a period of 6 months allows estimating the feasibility of the approach till the end of the project.

2006

pdf
Enhancing Electronic Dictionaries with an Index Based on Associations
Olivier Ferret | Michael Zock
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2004

pdf abs
Système d’aide à l’accès lexical : trouver le mot qu’on a sur le bout de la langue
Gaëlle Lortal | Brigitte Grau | Michael Zock
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Le Mot sur le Bout de la Langue (Tip Of the Tongue en anglais), phénomène très étudié par les psycholinguistes, nous a amené nombre d’informations concernant l’organisation du lexique mental. Un locuteur en état de TOT reconnaît instantanément le mot recherché présenté dans une liste. Il en connaît le sens, la forme, les liens avec d’autres mots... Nous présentons ici une étude de développement d’outil qui prend en compte ces spécificités, pour assister un locuteur/rédacteur à trouver le mot qu’il a sur le bout de la langue. Elle consiste à recréer le phénomène du TOT, où, dans un contexte de production un mot, connu par le système, est momentanément inaccessible. L’accès au mot se fait progressivement grâce aux informations provenant de bases de données linguistiques. Ces dernières sont essentiellement des relations de type paradigmatique et syntagmatique. Il s’avère qu’un outil, tel que SVETLAN, capable de structurer automatiquement un dictionnaire par domaine, peut être avantageusement combiné à une base de données riche en liens paradigmatiques comme EuroWordNet, augmentant considérablement les chances de trouver le mot auquel on ne peut accéder.

pdf
Word Lookup on the Basis of Associations : from an Idea to a Roadmap
Michael Zock | Slaven Bilac
Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries