Mitch Marcus

Also published as: M. Marcus, Mitchell Marcus, Mitchell P. Marcus


2024

pdf
Annotating Chinese Word Senses with English WordNet: A Practice on OntoNotes Chinese Sense Inventories
Hongzhi Xu | Jingxia Lin | Sameer Pradhan | Mitchell Marcus | Ming Liu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we present our exploration of annotating Chinese word senses using English WordNet synsets, with examples extracted from OntoNotes Chinese sense inventories. Given a target word along with the example that contains it, the annotators select a WordNet synset that best describes the meaning of the target word in the context. The result demonstrates an inter-annotator agreement of 38% between two annotators. We delve into the instances of disagreement by comparing the two annotated synsets, including their positions within the WordNet hierarchy. The examination reveals intriguing patterns among closely related synsets, shedding light on similar concepts represented within the WordNet structure. The data offers as an indirect linking of Chinese word senses defined in OntoNotes Chinese sense inventories to WordNet sysnets, and thus promotes the value of the OntoNotes corpus. Compared to a direct linking of Chinese word senses to WordNet synsets, the example-based annotation has the merit of not being affected by inaccurate sense definitions and thus offers a new way of mapping WordNets of different languages. At the same time, the annotated data also serves as a valuable linguistic resource for exploring potential lexical differences between English and Chinese, with potential contributions to the broader understanding of cross-linguistic semantic mapping

2020

pdf
Morphological Segmentation for Low Resource Languages
Justin Mott | Ann Bies | Stephanie Strassel | Jordan Kodner | Caitlin Richter | Hongzhi Xu | Mitchell Marcus
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes a new morphology resource created by Linguistic Data Consortium and the University of Pennsylvania for the DARPA LORELEI Program. The data consists of approximately 2000 tokens annotated for morphological segmentation in each of 9 low resource languages, along with root information for 7 of the languages. The languages annotated show a broad diversity of typological features. A minimal annotation scheme for segmentation was developed such that it could capture the patterns of a wide range of languages and also be performed reliably by non-linguist annotators. The basic annotation guidelines were designed to be language-independent, but included language-specific morphological paradigms and other specifications. The resulting annotated corpus is designed to support and stimulate the development of unsupervised morphological segmenters and analyzers by providing a gold standard for their evaluation on a more typologically diverse set of languages than has previously been available. By providing root annotation, this corpus is also a step toward supporting research in identifying richer morphological structures than simple morpheme boundaries.

pdf
Modeling Morphological Typology for Unsupervised Learning of Language Morphology
Hongzhi Xu | Jordan Kodner | Mitchell Marcus | Charles Yang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper describes a language-independent model for fully unsupervised morphological analysis that exploits a universal framework leveraging morphological typology. By modeling morphological processes including suffixation, prefixation, infixation, and full and partial reduplication with constrained stem change rules, our system effectively constrains the search space and offers a wide coverage in terms of morphological typology. The system is tested on nine typologically and genetically diverse languages, and shows superior performance over leading systems. We also investigate the effect of an oracle that provides only a handful of bits per language to signal morphological type.

2018

pdf
Unsupervised Morphology Learning with Statistical Paradigms
Hongzhi Xu | Mitchell Marcus | Charles Yang | Lyle Ungar
Proceedings of the 27th International Conference on Computational Linguistics

This paper describes an unsupervised model for morphological segmentation that exploits the notion of paradigms, which are sets of morphological categories (e.g., suffixes) that can be applied to a homogeneous set of words (e.g., nouns or verbs). Our algorithm identifies statistically reliable paradigms from the morphological segmentation result of a probabilistic model, and chooses reliable suffixes from them. The new suffixes can be fed back iteratively to improve the accuracy of the probabilistic model. Finally, the unreliable paradigms are subjected to pruning to eliminate unreliable morphological relations between words. The paradigm-based algorithm significantly improves segmentation accuracy. Our method achieves start-of-the-art results on experiments using the Morpho-Challenge data, including English, Turkish, and Finnish.

pdf
Low-resource Post Processing of Noisy OCR Output for Historical Corpus Digitisation
Caitlin Richter | Matthew Wickes | Deniz Beser | Mitch Marcus
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
Case Studies in the Automatic Characterization of Grammars from Small Wordlists
Jordan Kodner | Spencer Caplan | Hongzhi Xu | Mitchell P. Marcus | Charles Yang
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

2015

pdf
System Combination for Multi-document Summarization
Kai Hong | Mitchell Marcus | Ani Nenkova
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2013

pdf bib
Finding Optimal 1-Endpoint-Crossing Trees
Emily Pitler | Sampath Kannan | Mitchell Marcus
Transactions of the Association for Computational Linguistics, Volume 1

Dependency parsing algorithms capable of producing the types of crossing dependencies seen in natural language sentences have traditionally been orders of magnitude slower than algorithms for projective trees. For 95.8–99.8% of dependency parses in various natural language treebanks, whenever an edge is crossed, the edges that cross it all have a common vertex. The optimal dependency tree that satisfies this 1-Endpoint-Crossing property can be found with an O(n4) parsing algorithm that recursively combines forests over intervals with one exterior point. 1-Endpoint-Crossing trees also have natural connections to linguistics and another class of graphs that has been studied in NLP.

2012

pdf
Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation
Qiuye Zhao | Mitch Marcus
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Long-Tail Distributions and Unsupervised Learning of Morphology
Qiuye Zhao | Mitch Marcus
Proceedings of COLING 2012

pdf
Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
Emily Pitler | Sampath Kannan | Mitchell Marcus
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
Sameer Pradhan | Lance Ramshaw | Mitchell Marcus | Martha Palmer | Ralph Weischedel | Nianwen Xue
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf
Functional Elements and POS Categories
Qiuye Zhao | Mitch Marcus
Proceedings of 5th International Joint Conference on Natural Language Processing

2009

pdf
A Simple Unsupervised Learner for POS Disambiguation Rules Given Only a Minimal Lexicon
Qiuye Zhao | Mitch Marcus
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2007

pdf
Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features
Nizar Habash | Ryan Gabbard | Owen Rambow | Seth Kulick | Mitch Marcus
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf
Fully Parsing the Penn Treebank
Ryan Gabbard | Seth Kulick | Mitchell Marcus
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf
OntoNotes: The 90% Solution
Eduard Hovy | Mitchell Marcus | Martha Palmer | Lance Ramshaw | Ralph Weischedel
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Doctoral Consortium
Matt Huenerfauth | Bo Pang | Mitch Marcus
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Doctoral Consortium

pdf
Issues in Synchronizing the English Treebank and PropBank
Olga Babko-Malaya | Ann Bies | Ann Taylor | Szuting Yi | Martha Palmer | Mitch Marcus | Seth Kulick | Libin Shen
Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006

2000

pdf
Developing Guidelines and Ensuring Consistency for Chinese Text Annotation
Fei Xia | Martha Palmer | Nianwen Xue | Mary Ellen Okurowski | John Kovarik | Fu-Dong Chiou | Shizhe Huang | Tony Kroch | Mitch Marcus
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf
Towards Unsupervised Extraction of Verb Paradigms from Large Corpora
Cornelia H. Parkes | Alexander M. Malek | Mitchell P. Marcus
Sixth Workshop on Very Large Corpora

1997

pdf bib
Summary of Invited Speech
Mitch Marcus
Fifth Workshop on Very Large Corpora

1995

pdf
Text Chunking using Transformation-Based Learning
Lance Ramshaw | Mitch Marcus
Third Workshop on Very Large Corpora

1994

pdf
Exploring the Statistical Derivation of Transformational Rule Sequences for Part-of-Speech Tagging
Lance A. Ramshaw | Mitchell P. Marcus
The Balancing Act: Combining Symbolic and Statistical Approaches to Language

pdf
The Penn Treebank: Annotating Predicate Argument Structure
Mitchell Marcus | Grace Kim | Mary Ann Marcinkiewicz | Robert MacIntyre | Ann Bies | Mark Ferguson | Karen Katz | Britta Schasberger
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

pdf
Research in Natural Language Processing
A. Joshi | M. Marcus | M. Steedman | B. Webber
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

1993

pdf
Building a Large Annotated Corpus of English: The Penn Treebank
Mitchell P. Marcus | Beatrice Santorini | Mary Ann Marcinkiewicz
Computational Linguistics, Volume 19, Number 2, June 1993, Special Issue on Using Large Corpora: II

pdf
Session 8: Statistical Language Modeling
Mitchell Marcus
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

pdf
Natural Language Research
Aravind Joshi | Mitch Marcus | Mark Steedman | Bonnie Webber
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

1992

pdf bib
Overview of the Fifth DARPA Speech and Natural Language Workshop
Mitchell P. Marcus
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf
Automatically Acquiring Phrase Structure Using Distributional Analysis
Eric Brill | Mitchell Marcus
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf
Natural Language Research
Aravind Joshi | Mitch Marcus | Mark Steedman | Bonnie Webber
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

1991

pdf
Pearl: A Probabilistic Chart Parser
David M. Magerman | Mitchell P. Marcus
Proceedings of the Second International Workshop on Parsing Technologies

This paper describes a natural language parsing algorithm for unrestricted text which uses a probability-based scoring function to select the “best” parse of a sentence. The parser, Pearl, is a time-asynchronous bottom-up chart parser with Earley-type top-down prediction which pursues the highest-scoring theory in the chart, where the score of a theory represents the extent to which the context of the sentence predicts that interpretation. This parser differs from previous attempts at stochastic parsers in that it uses a richer form of conditional probabilities based on context to predict likelihood. Pearl also provides a framework for incorporating the results of previous work in part-of-speech assignment, unknown word models, and other probabilistic models of linguistic features into one parsing tool, interleaving these techniques instead of using the traditional pipeline architecture. In preliminary tests, Pearl has been successful at resolving part-of-speech and word (in speech processing) ambiguity, determining categories for unknown words, and selecting correct parses first using a very loosely fitting covering grammar.

pdf
Pearl: A Probabilistic Chart Parser
David M. Magerrnan | Mitchell P. Marcus
Fifth Conference of the European Chapter of the Association for Computational Linguistics

pdf
Parsing the Voyager Domain Using Pearl
David M. Magerman | Mitchell P. Marcus
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
E. Black | S. Abney | D. Flickenger | C. Gdaniec | R. Grishman | P. Harrison | D. Hindle | R. Ingria | F. Jelinek | J. Klavans | M. Liberman | M. Marcus | S. Roukos | B. Santorini | T. Strzalkowski
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf
Session 11 - Natural Language III
Mitch Marcus
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf
Natural Language Research
Aravind K. Joshi | Mitch Marcus | Mark Steedman | Bonnie Webber
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf
Very Large Annotated Database of American English
Mitch Marcus
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

1990

pdf
Session 9: Automatic Acquisition of Linguistic Structure
Mitchell Marcus
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

pdf
Deducing Linguistic Structure from the Statistics of Large Corpora
Eric Brill | David Magerman | Mitchell Marcus | Beatrice Santorini
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

pdf
Natural Language Research
Aravind Joshi | Mitch Marcus | Mark Steedman | Bonnie Webber
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

pdf
Very Large Annotated Database of American English
Mitch Marcus
Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990

1989

pdf
Automatic Acquisition of the Lexical Semantics of Verbs From Sentence Frames
Mort Webster | Mitch Marcus
27th Annual Meeting of the Association for Computational Linguistics

pdf
Natural Language Research
Aravind Joshi | Mitch Marcus | Mark Steedman | Bonnie Webber
Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, Pennsylvania, February 21-23, 1989

pdf
White Paper on Natural Language Processing
Ralph Weischedel | Jaime Carbonell | Barbara Grosz | Wendy Lehnert | Mitchell Marcus | Raymond Perrault | Robert Wilensky
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

1987

pdf
Generation Systems Should Choose Their Words
Mitchell Marcus
Theoretical Issues in Natural Language Processing 3

1983

pdf
D-Theory: Talking about Talking about Trees
Mitchell P. Marcus | Donald Hindle | Margaret M. Fleck
21st Annual Meeting of the Association for Computational Linguistics

1982

pdf
Building Non-Normative Systems - The Search for Robustness: An Overview
Mitchell P. Marcus
20th Annual Meeting of the Association for Computational Linguistics

1978

pdf
A Computational Account of Some Constraints on Language
Mitchell Marcus
Theoretical Issues in Natural Language Processing-2

pdf
A Computational Account of Some Constraints on Language
Mitchell Marcus
American Journal of Computational Linguistics (December 1978)

1975

pdf bib
Diagnosis as a Notion of Grammar
Mitchell Marcus
Theoretical Issues in Natural Language Processing