Eugene Charniak

While both spoken and written language processing stand to benefit from parsing, the standard Parseval metrics (Black et al., 1991) and their canonical implementation (Sekine and Collins, 1997) are only useful for text. The Parseval metrics are undefined when the words input to the parser do not match the words in the gold standard parse tree exactly, and word errors are unavoidable with automatic speech recognition (ASR) systems. To fill this gap, we have developed a publicly available tool for scoring parses that implements a variety of metrics which can handle mismatches in words and segmentations, including: alignment-based bracket evaluation, alignment-based dependency evaluation, and a dependency evaluation that does not require alignment. We describe the different metrics, how to use the tool, and the outcome of an extensive set of experiments on the sensitivity.

pdf bib

Effective Self-Training for Parsing
David McClosky | Eugene Charniak | Mark Johnson
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib

pdf bib

Reranking and Self-Training for Parser Adaptation
David McClosky | Eugene Charniak | Mark Johnson
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib

Learning Phrasal Categories
William P. Headden III | Eugene Charniak | Mark Johnson
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib

Effective Use of Prosody in Parsing Conversational Speech
Jeremy G. Kahn | Matthew Lease | Eugene Charniak | Mark Johnson | Mari Ostendorf
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib

Parsing Biomedical Literature
Matthew Lease | Eugene Charniak
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib

Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
Eugene Charniak | Mark Johnson
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib

Supervised and Unsupervised Learning for Sentence Compression
Jenine Turner | Eugene Charniak
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib

Using the Penn Treebank to Evaluate Non-Treebank Parsers
Eric K. Ringger | Robert C. Moore | Eugene Charniak | Lucy Vanderwende | Hisami Suzuki
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib

Sentence-Internal Prosody Does not Help Parsing the Way Punctuation Does
Michelle Gregory | Mark Johnson | Eugene Charniak
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib

A TAG-based noisy-channel model of speech repairs
Mark Johnson | Eugene Charniak
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib abs

Syntax-based language models for statistical machine translation
Eugene Charniak | Kevin Knight | Kenji Yamada
Proceedings of Machine Translation Summit IX: Papers

We present a syntax-based language model for use in noisy-channel machine translation. In particular, a language model based upon that described in (Cha01) is combined with the syntax based translation-model described in (YK01). The resulting system was used to translate 347 sentences from Chinese to English and compared with the results of an IBM-model-4-based system, as well as that of (YK02), all trained on the same data. The translations were sorted into four groups: good/bad syntax crossed with good/bad meaning. While the total number of translations that preserved meaning were the same for (YK02) and the syntax-based system (and both higher than the IBM-model-4-based system), the syntax based system had 45% more translations that also had good syntax than did (YK02) (and approximately 70% more than IBM Model 4). The number of translations that did not preserve meaning, but at least had good grammar, also increased, though to less avail.

pdf bib

Variation of Entropy and Parse Trees of Sentences as a Function of the Sentence Number
Dmitriy Genzel | Eugene Charniak
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing