2010
pdf
bib
Acquisition de connaissances lexicales à partir de corpus : la sous-catégorisation verbale en français [Lexical acquisition from corpora: the case of subcategorization frames in French]
Cédric Messiant
|
Kata Gábor
|
Thierry Poibeau
Traitement Automatique des Langues, Volume 51, Numéro 1 : Varia [Varia]
pdf
Investigating the cross-linguistic potential of VerbNet-style classification
Lin Sun
|
Thierry Poibeau
|
Anna Korhonen
|
Cédric Messiant
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
2009
pdf
abs
La complémentarité des approches manuelle et automatique en acquisition lexicale
Cédric Messiant
|
Takuya Nakamura
|
Stavroula Voyatzi
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Les ressources lexicales sont essentielles pour obtenir des systèmes de traitement des langues performants. Ces ressources peuvent être soit construites à la main, soit acquises automatiquement à partir de gros corpus. Dans cet article, nous montrons la complémentarité de ces deux approches. Pour ce faire, nous utilisons l’exemple de la sous-catégorisation verbale en comparant un lexique acquis par des méthodes automatiques (LexSchem) avec un lexique construit manuellement (Le Lexique-Grammaire). Nous montrons que les informations acquises par ces deux méthodes sont bien distinctes et qu’elles peuvent s’enrichir mutuellement.
2008
pdf
abs
Do we Still Need Gold Standards for Evaluation?
Thierry Poibeau
|
Cédric Messiant
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The availability of a huge mass of textual data in electronic format has increased the need for fast and accurate techniques for textual data processing. Machine learning and statistical approaches have been increasingly used in NLP since a decade, mainly because they are quick, versatile and efficient. However, despite this evolution of the field, evaluation still rely (most of the time) on a comparison between the output of a probabilistic or statistical system on the one hand, and a non-statistic, most of the time hand-crafted, gold standard on the other hand. In this paper, we take the example of the acquisition of subcategorization frames from corpora as a practical example. Our study is motivated by the fact that, even if a gold standard is an invaluable resource for evaluation, a gold standard is always partial and does not really show how accurate and useful results are.
pdf
abs
LexSchem: a Large Subcategorization Lexicon for French Verbs
Cédric Messiant
|
Thierry Poibeau
|
Anna Korhonen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper presents LexSchem - the first large, fully automatically acquired subcategorization lexicon for French verbs. The lexicon includes subcategorization frame and frequency information for 3297 French verbs. When evaluated on a set of 20 test verbs against a gold standard dictionary, it shows 0.79 precision, 0.55 recall and 0.65 F-measure. We have made this resource freely available to the research community on the web.
pdf
A Subcategorization Acquisition System for French Verbs
Cédric Messiant
Proceedings of the ACL-08: HLT Student Research Workshop