Mark Johnson - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Mark Johnson

2025

Large language models (LLMs) have shown impressive performance in code understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation. Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and taxonomy of these techniques, emphasizing recent advancements. We highlight key challenges, explore future research directions, and offer practical guidance for new researchers entering the field.

2023

Sources of Hallucination by Large Language Models on Inference Tasks
Nick McKenna | Tianyi Li | Liang Cheng | Mohammad Hosseini | Mark Johnson | Mark Steedman
Findings of the Association for Computational Linguistics: EMNLP 2023

Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First, memorization at the level of sentences: we show that, regardless of the premise, models falsely label NLI test samples as entailing when the hypothesis is attested in training data, and that entities are used as “indices’ to access the memorized data. Second, statistical patterns of usage learned at the level of corpora: we further show a similar effect when the premise predicate is less frequent than that of the hypothesis in the training data, a bias following from previous studies. We demonstrate that LLMs perform significantly worse on NLI test samples which do not conform to these biases than those which do, and we offer these as valuable controls for future LLM evaluation.

Smoothing Entailment Graphs with Language Models
Nick McKenna | Tianyi Li | Mark Johnson | Mark Steedman
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2021

Mention Flags (MF): Constraining Transformer-based Text Generators
Yufei Wang | Ian Wood | Stephen Wan | Mark Dras | Mark Johnson
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper focuses on Seq2Seq (S2S) constrained text generation where the text generator is constrained to mention specific words which are inputs to the encoder in the generated outputs. Pre-trained S2S models or a Copy Mechanism are trained to copy the surface tokens from encoders to decoders, but they cannot guarantee constraint satisfaction. Constrained decoding algorithms always produce hypotheses satisfying all constraints. However, they are computationally expensive and can lower the generated text quality. In this paper, we propose Mention Flags (MF), which traces whether lexical constraints are satisfied in the generated outputs in an S2S decoder. The MF models can be trained to generate tokens in a hypothesis until all constraints are satisfied, guaranteeing high constraint satisfaction. Our experiments on the Common Sense Generation task (CommonGen) (Lin et al., 2020), End2end Restaurant Dialog task (E2ENLG) (Duˇsek et al., 2020) and Novel Object Captioning task (nocaps) (Agrawal et al., 2019) show that the MF models maintain higher constraint satisfaction and text quality than the baseline models and other constrained decoding algorithms, achieving state-of-the-art performance on all three tasks. These results are achieved with a much lower run-time than constrained decoding algorithms. We also show that the MF models work well in the low-resource setting.

ECOL-R: Encouraging Copying in Novel Object Captioning with Reinforcement Learning
Yufei Wang | Ian Wood | Stephen Wan | Mark Johnson
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Novel Object Captioning is a zero-shot Image Captioning task requiring describing objects not seen in the training captions, but for which information is available from external object detectors. The key challenge is to select and describe all salient detected novel objects in the input images. In this paper, we focus on this challenge and propose the ECOL-R model (Encouraging Copying of Object Labels with Reinforced Learning), a copy-augmented transformer model that is encouraged to accurately describe the novel object labels. This is achieved via a specialised reward function in the SCST reinforcement learning framework (Rennie et al., 2017) that encourages novel object mentions while maintaining the caption quality. We further restrict the SCST training to the images where detected objects are mentioned in reference captions to train the ECOL-R model. We additionally improve our copy mechanism via Abstract Labels, which transfer knowledge from known to novel object types, and a Morphological Selector, which determines the appropriate inflected forms of novel object labels. The resulting model sets new state-of-the-art on the nocaps (Agrawal et al., 2019) and held-out COCO (Hendricks et al., 2016) benchmarks.

Multivalent Entailment Graphs for Question Answering
Nick McKenna | Liane Guillou | Mohammad Javad Hosseini | Sander Bijl de Vroe | Mark Johnson | Mark Steedman
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Drawing inferences between open-domain natural language predicates is a necessity for true language understanding. There has been much progress in unsupervised learning of entailment graphs for this purpose. We make three contributions: (1) we reinterpret the Distributional Inclusion Hypothesis to model entailment between predicates of different valencies, like DEFEAT(Biden, Trump) entails WIN(Biden); (2) we actualize this theory by learning unsupervised Multivalent Entailment Graphs of open-domain predicates; and (3) we demonstrate the capabilities of these graphs on a novel question answering task. We show that directional entailment is more helpful for inference than non-directional similarity on questions of fine-grained semantics. We also show that drawing on evidence across valencies answers more questions than by using only the same valency evidence.

Open-Domain Contextual Link Prediction and its Complementarity with Entailment Graphs
Mohammad Javad Hosseini | Shay B. Cohen | Mark Johnson | Mark Steedman
Findings of the Association for Computational Linguistics: EMNLP 2021

An open-domain knowledge graph (KG) has entities as nodes and natural language relations as edges, and is constructed by extracting (subject, relation, object) triples from text. The task of open-domain link prediction is to infer missing relations in the KG. Previous work has used standard link prediction for the task. Since triples are extracted from text, we can ground them in the larger textual context in which they were originally found. However, standard link prediction methods only rely on the KG structure and ignore the textual context that each triple was extracted from. In this paper, we introduce the new task of open-domain contextual link prediction which has access to both the textual context and the KG structure to perform link prediction. We build a dataset for the task and propose a model for it. Our experiments show that context is crucial in predicting missing relations. We also demonstrate the utility of contextual link prediction in discovering context-independent entailments between relations, in the form of entailment graphs (EG), in which the nodes are the relations. The reverse holds too: context-independent EGs assist in predicting relations in context.

Blindness to Modality Helps Entailment Graph Mining
Liane Guillou | Sander Bijl de Vroe | Mark Johnson | Mark Steedman
Proceedings of the Second Workshop on Insights from Negative Results in NLP

Understanding linguistic modality is widely seen as important for downstream tasks such as Question Answering and Knowledge Graph Population. Entailment Graph learning might also be expected to benefit from attention to modality. We build Entailment Graphs using a news corpus filtered with a modality parser, and show that stripping modal modifiers from predicates in fact increases performance. This suggests that for some tasks, the pragmatics of modal modification of predicates allows them to contribute as evidence of entailment.

Integrating Lexical Information into Entity Neighbourhood Representations for Relation Prediction
Ian Wood | Mark Johnson | Stephen Wan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Relation prediction informed from a combination of text corpora and curated knowledge bases, combining knowledge graph completion with relation extraction, is a relatively little studied task. A system that can perform this task has the ability to extend an arbitrary set of relational database tables with information extracted from a document corpus. OpenKi[1] addresses this task through extraction of named entities and predicates via OpenIE tools then learning relation embeddings from the resulting entity-relation graph for relation prediction, outperforming previous approaches. We present an extension of OpenKi that incorporates embeddings of text-based representations of the entities and the relations. We demonstrate that this results in a substantial performance increase over a system without this information.

2020

Improving Disfluency Detection by Self-Training a Self-Attentive Model
Paria Jamshid Lou | Mark Johnson
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self-attentive neural syntactic parsers using contextualized word embeddings (e.g. ELMo or BERT) currently produce state-of-the-art results in joint parsing and disfluency detection in speech transcripts. Since the contextualized word embeddings are pre-trained on a large amount of unlabeled data, using additional unlabeled data to train a neural model might seem redundant. However, we show that self-training — a semi-supervised technique for incorporating unlabeled data — sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations. We also show that ensembling self-trained parsers provides further gains for disfluency detection.

End-to-End Speech Recognition and Disfluency Removal
Paria Jamshid Lou | Mark Johnson
Findings of the Association for Computational Linguistics: EMNLP 2020

Disfluency detection is usually an intermediate step between an automatic speech recognition (ASR) system and a downstream task. By contrast, this paper aims to investigate the task of end-to-end speech recognition and disfluency removal. We specifically explore whether it is possible to train an ASR model to directly map disfluent speech into fluent transcripts, without relying on a separate disfluency detection model. We show that end-to-end models do learn to directly generate fluent transcripts; however, their performance is slightly worse than a baseline pipeline approach consisting of an ASR system and a specialized disfluency detection model. We also propose two new metrics for evaluating integrated ASR and disfluency removal models. The findings of this paper can serve as a benchmark for further research on the task of end-to-end speech recognition and disfluency removal in the future.

Transactions of the Association for Computational Linguistics, Volume 8
Mark Johnson | Brian Roark | Ani Nenkova
Transactions of the Association for Computational Linguistics, Volume 8

Incorporating Temporal Information in Entailment Graph Mining
Liane Guillou | Sander Bijl de Vroe | Mohammad Javad Hosseini | Mark Johnson | Mark Steedman
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

We present a novel method for injecting temporality into entailment graphs to address the problem of spurious entailments, which may arise from similar but temporally distinct events involving the same pair of entities. We focus on the sports domain in which the same pairs of teams play on different occasions, with different outcomes. We present an unsupervised model that aims to learn entailments such as win/lose → play, while avoiding the pitfall of learning non-entailments such as win ̸→ lose. We evaluate our model on a manually constructed dataset, showing that incorporating time intervals and applying a temporal window around them, are effective strategies.

2019

Neural Constituency Parsing of Speech Transcripts
Paria Jamshid Lou | Yufei Wang | Mark Johnson
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

This paper studies the performance of a neural self-attentive parser on transcribed speech. Speech presents parsing challenges that do not appear in written text, such as the lack of punctuation and the presence of speech disfluencies (including filled pauses, repetitions, corrections, etc.). Disfluencies are especially problematic for conventional syntactic parsers, which typically fail to find any EDITED disfluency nodes at all. This motivated the development of special disfluency detection systems, and special mechanisms added to parsers specifically to handle disfluencies. However, we show here that neural parsers can find EDITED disfluency nodes, and the best neural parsers find them with an accuracy surpassing that of specialized disfluency detection systems, thus making these specialized mechanisms unnecessary. This paper also investigates a modified loss function that puts more weight on EDITED nodes. It also describes tree-transformations that simplify the disfluency detection task by providing alternative encodings of disfluencies and syntactic information.

Duality of Link Prediction and Entailment Graph Induction
Mohammad Javad Hosseini | Shay B. Cohen | Mark Johnson | Mark Steedman
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Link prediction and entailment graph induction are often treated as different problems. In this paper, we show that these two problems are actually complementary. We train a link prediction model on a knowledge graph of assertions extracted from raw text. We propose an entailment score that exploits the new facts discovered by the link prediction model, and then form entailment graphs between relations. We further use the learned entailments to predict improved link prediction scores. Our results show that the two tasks can benefit from each other. The new entailment score outperforms prior state-of-the-art results on a standard entialment dataset and the new link prediction scores show improvements over the raw link prediction scores.

How to Best Use Syntax in Semantic Role Labelling
Yufei Wang | Mark Johnson | Stephen Wan | Yifang Sun | Wei Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

There are many different ways in which external information might be used in a NLP task. This paper investigates how external syntactic information can be used most effectively in the Semantic Role Labeling (SRL) task. We evaluate three different ways of encoding syntactic parses and three different ways of injecting them into a state-of-the-art neural ELMo-based SRL sequence labelling model. We show that using a constituency representation as input features improves performance the most, achieving a new state-of-the-art for non-ensemble SRL models on the in-domain CoNLL’05 and CoNLL’12 benchmarks.

An adaptable task-oriented dialog system for stand-alone embedded devices
Long Duong | Vu Cong Duy Hoang | Tuyen Quang Pham | Yu-Heng Hong | Vladislavs Dovgalecs | Guy Bashkansky | Jason Black | Andrew Bleeker | Serge Le Huitouze | Mark Johnson
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This paper describes a spoken-language end-to-end task-oriented dialogue system for small embedded devices such as home appliances. While the current system implements a smart alarm clock with advanced calendar scheduling functionality, the system is designed to make it easy to port to other application domains (e.g., the dialogue component factors out domain-specific execution from domain-general actions such as requesting and updating slot values). The system does not require internet connectivity because all components, including speech recognition, natural language understanding, dialogue management, execution and text-to-speech, run locally on the embedded device (our demo uses a Raspberry Pi). This simplifies deployment, minimizes server costs and most importantly, eliminates user privacy risks. The demo video in alarm domain is here youtu.be/N3IBMGocvHU

Transactions of the Association for Computational Linguistics, Volume 7
Lillian Lee | Mark Johnson | Brian Roark | Ani Nenkova
Transactions of the Association for Computational Linguistics, Volume 7

2018

Disfluency Detection using Auto-Correlational Neural Networks
Paria Jamshid Lou | Peter Anderson | Mark Johnson
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In recent years, the natural language processing community has moved away from task-specific feature engineering, i.e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves. However, state-of-the-art approaches to disfluency detection in spontaneous speech transcripts currently still depend on an array of hand-crafted features, and other representations derived from the output of pre-existing systems such as language models or dependency parsers. As an alternative, this paper proposes a simple yet effective model for automatic disfluency detection, called an auto-correlational neural network (ACNN). The model uses a convolutional neural network (CNN) and augments it with a new auto-correlation operator at the lowest layer that can capture the kinds of “rough copy” dependencies that are characteristic of repair disfluencies in speech. In experiments, the ACNN model outperforms the baseline CNN on a disfluency detection task with a 5% increase in f-score, which is close to the previous best result on this task.

A Fast and Accurate Vietnamese Word Segmenter
Dat Quoc Nguyen | Dai Quoc Nguyen | Thanh Vu | Mark Dras | Mark Johnson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

VnCoreNLP: A Vietnamese Natural Language Processing Toolkit
Thanh Vu | Dat Quoc Nguyen | Dai Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

We present an easy-to-use and fast toolkit, namely VnCoreNLP—a Java NLP annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language processing (NLP) tasks including word segmentation, part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing, and obtains state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to provide rich linguistic annotations to facilitate research work on Vietnamese NLP. Our VnCoreNLP is open-source and available at: https://github.com/vncorenlp/VnCoreNLP

AMR dependency parsing with a typed semantic algebra
Jonas Groschwitz | Matthias Lindemann | Meaghan Fowlie | Mark Johnson | Alexander Koller
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph. This allows us to use standard neural techniques for supertagging and dependency tree parsing, constrained by a linguistically principled type system. We present two approximative decoding algorithms, which achieve state-of-the-art accuracy and outperform strong baselines.

Active learning for deep semantic parsing
Long Duong | Hadi Afshar | Dominique Estival | Glen Pink | Philip Cohen | Mark Johnson
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Semantic parsing requires training data that is expensive and slow to collect. We apply active learning to both traditional and “overnight” data collection approaches. We show that it is possible to obtain good training hyperparameters from seed data which is only a small fraction of the full dataset. We show that uncertainty sampling based on least confidence score is competitive in traditional data collection but not applicable for overnight collection. We propose several active learning strategies for overnight data collection and show that different example selection strategies per domain perform best.

Predicting accuracy on large datasets from smaller pilot data
Mark Johnson | Peter Anderson | Mark Dras | Mark Steedman
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Because obtaining training data is often the most difficult part of an NLP or ML project, we develop methods for predicting how much data is required to achieve a desired test accuracy by extrapolating results from models trained on a small pilot training dataset. We model how accuracy varies as a function of training size on subsets of the pilot data, and use that model to predict how much training data would be required to achieve the desired accuracy. We introduce a new performance extrapolation task to evaluate how well different extrapolations predict accuracy on larger training sets. We show that details of hyperparameter optimisation and the extrapolation models can have dramatic effects in a document classification task. We believe this is an important first step in developing methods for estimating the resources required to meet specific engineering performance targets.

Transactions of the Association for Computational Linguistics, Volume 6
Lillian Lee | Mark Johnson | Kristina Toutanova | Brian Roark
Transactions of the Association for Computational Linguistics, Volume 6

Learning Typed Entailment Graphs with Global Soft Constraints
Mohammad Javad Hosseini | Nathanael Chambers | Siva Reddy | Xavier R. Holt | Shay B. Cohen | Mark Johnson | Mark Steedman
Transactions of the Association for Computational Linguistics, Volume 6

This paper presents a new method for learning typed entailment graphs from text. We extract predicate-argument structures from multiple-source news corpora, and compute local distributional similarity scores to learn entailments between predicates with typed arguments (e.g., person contracted disease). Previous work has used transitivity constraints to improve local decisions, but these constraints are intractable on large graphs. We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph. Learning takes only a few hours to run over 100K predicates and our results show large improvements over local similarity scores on two entailment data sets. We further show improvements over paraphrases and entailments from the Paraphrase Database, and prior state-of-the-art entailment graphs. We show that the entailment graphs improve performance in a downstream task.

2017

Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson | Basura Fernando | Mark Johnson | Stephen Gould
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-training. Our method uses constrained beam search to force the inclusion of selected tag words in the output, and fixed, pretrained word embeddings to facilitate vocabulary expansion to previously unseen tag words. Using this approach we achieve state of the art results for out-of-domain captioning on MSCOCO (and improved results for in-domain captioning). Perhaps surprisingly, our results significantly outperform approaches that incorporate the same tag predictions into the learning algorithm. We also show that we can significantly improve the quality of generated ImageNet captions by leveraging ground-truth labels.

Idea density for predicting Alzheimer’s disease from transcribed speech
Kairit Sirts | Olivier Piguet | Mark Johnson
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Idea Density (ID) measures the rate at which ideas or elementary predications are expressed in an utterance or in a text. Lower ID is found to be associated with an increased risk of developing Alzheimer’s disease (AD) (Snowdon et al., 1996; Engelman et al., 2010). ID has been used in two different versions: propositional idea density (PID) counts the expressed ideas and can be applied to any text while semantic idea density (SID) counts pre-defined information content units and is naturally more applicable to normative domains, such as picture description tasks. In this paper, we develop DEPID, a novel dependency-based method for computing PID, and its version DEPID-R that enables to exclude repeating ideas—a feature characteristic to AD speech. We conduct the first comparison of automatically extracted PID and SID in the diagnostic classification task on two different AD datasets covering both closed-topic and free-recall domains. While SID performs better on the normative dataset, adding PID leads to a small but significant improvement (+1.7 F-score). On the free-topic dataset, PID performs better than SID as expected (77.6 vs 72.3 in F-score) but adding the features derived from the word embedding clustering underlying the automatic SID increases the results considerably, leading to an F-score of 84.8.

Multilingual Semantic Parsing And Code-Switching
Long Duong | Hadi Afshar | Dominique Estival | Glen Pink | Philip Cohen | Mark Johnson
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Extending semantic parsing systems to new domains and languages is a highly expensive, time-consuming process, so making effective use of existing resources is critical. In this paper, we describe a transfer learning method using crosslingual word embeddings in a sequence-to-sequence model. On the NLmaps corpus, our approach achieves state-of-the-art accuracy of 85.7% for English. Most importantly, we observed a consistent improvement for German compared with several baseline domain adaptation techniques. As a by-product of this approach, our models that are trained on a combination of English and German utterances perform reasonably well on code-switching utterances which contain a mixture of English and German, even though the training data does not contain any such. As far as we know, this is the first study of code-switching in semantic parsing. We manually constructed the set of code-switching test utterances for the NLmaps corpus and achieve 78.3% accuracy on this dataset.

A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing
Dat Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We present a novel neural network model that learns POS tagging and graph-based dependency parsing jointly. Our model uses bidirectional LSTMs to learn feature representations shared for both POS tagging and dependency parsing tasks, thus handling the feature-engineering problem. Our extensive experiments, on 19 languages from the Universal Dependencies project, show that our model outperforms the state-of-the-art neural network-based Stack-propagation model for joint POS tagging and transition-based dependency parsing, resulting in a new state of the art. Our code is open-source and available together with pre-trained models at: https://github.com/datquocnguyen/jPTDP

Unsupervised Text Segmentation Based on Native Language Characteristics
Shervin Malmasi | Mark Dras | Mark Johnson | Lan Du | Magdalena Wolska
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.

Disfluency Detection using a Noisy Channel Model and a Deep Neural Language Model
Paria Jamshid Lou | Mark Johnson
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper presents a model for disfluency detection in spontaneous speech transcripts called LSTM Noisy Channel Model. The model uses a Noisy Channel Model (NCM) to generate n-best candidate disfluency analyses and a Long Short-Term Memory (LSTM) language model to score the underlying fluent sentences of each analysis. The LSTM language model scores, along with other features, are used in a MaxEnt reranker to identify the most plausible analysis. We show that using an LSTM language model in the reranking process of noisy channel disfluency model improves the state-of-the-art in disfluency detection.

Transactions of the Association for Computational Linguistics, Volume 5
Lillian Lee | Mark Johnson | Kristina Toutanova
Transactions of the Association for Computational Linguistics, Volume 5

From Word Segmentation to POS Tagging for Vietnamese
Dat Quoc Nguyen | Thanh Vu | Dai Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2017

A constrained graph algebra for semantic parsing with AMRs
Jonas Groschwitz | Meaghan Fowlie | Mark Johnson | Alexander Koller
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

2016

Grammar induction from (lots of) words alone
John K Pate | Mark Johnson
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Grammar induction is the task of learning syntactic structure in a setting where that structure is hidden. Grammar induction from words alone is interesting because it is similiar to the problem that a child learning a language faces. Previous work has typically assumed richer but cognitively implausible input, such as POS tag annotated data, which makes that work less relevant to human language acquisition. We show that grammar induction from words alone is in fact feasible when the model is provided with sufficient training data, and present two new streaming or mini-batch algorithms for PCFG inference that can learn from millions of words of training data. We compare the performance of these algorithms to a batch algorithm that learns from less data. The minibatch algorithms outperform the batch algorithm, showing that cheap inference with more data is better than intensive inference with less data. Additionally, we show that the harmonic initialiser, which previous work identified as essential when learning from small POS-tag annotated corpora (Klein and Manning, 2004), is not superior to a uniform initialisation.

Using Left-corner Parsing to Encode Universal Structural Constraints in Grammar Induction
Hiroshi Noji | Yusuke Miyao | Mark Johnson
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Neighborhood Mixture Model for Knowledge Base Completion
Dat Quoc Nguyen | Kairit Sirts | Lizhen Qu | Mark Johnson
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

STransE: a novel embedding model of entities and relationships in knowledge bases
Dat Quoc Nguyen | Kairit Sirts | Lizhen Qu | Mark Johnson
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Efficient techniques for parsing with tree automata
Jonas Groschwitz | Alexander Koller | Mark Johnson
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Transactions of the Association for Computational Linguistics, Volume 4
Lillian Lee | Mark Johnson | Kristina Toutanova
Transactions of the Association for Computational Linguistics, Volume 4

Unsupervised Pre-training With Seq2Seq Reconstruction Loss for Deep Relation Extraction Models
Zhuang Li | Lizhen Qu | Qiongkai Xu | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2016

An empirical study for Vietnamese dependency parsing
Dat Quoc Nguyen | Mark Dras | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2016

2015

An Improved Non-monotonic Transition System for Dependency Parsing
Matthew Honnibal | Mark Johnson
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

An Incremental Algorithm for Transition-based CCG Parsing
Bharat Ram Ambati | Tejaswini Deoskar | Mark Johnson | Mark Steedman
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Sign constraints on feature weights improve a joint model of word segmentation and phonology
Mark Johnson | Joe Pater | Robert Staubs | Emmanuel Dupoux
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A Computationally Efficient Algorithm for Learning Topical Collocation Models
Zhendong Zhao | Lan Du | Benjamin Börschinger | John K Pate | Massimiliano Ciaramita | Mark Steedman | Mark Johnson
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Improving Topic Models with Latent Feature Word Representations
Dat Quoc Nguyen | Richard Billingsley | Lan Du | Mark Johnson
Transactions of the Association for Computational Linguistics, Volume 3

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

Using Entity Information from a Knowledge Base to Improve Relation Extraction
Lan Du | Anish Kumar | Mark Johnson | Massimiliano Ciaramita
Proceedings of the Australasian Language Technology Association Workshop 2015

Do POS Tags Help to Learn Better Morphological Segmentations?
Kairit Sirts | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2015

More Efficient Topic Modelling Through a Noun Only Approach
Fiona Martin | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2015

Improving Topic Coherence with Latent Feature Word Representations in MAP Estimation for Topic Modeling
Dat Quoc Nguyen | Kairit Sirts | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2015

2014

Unsupervised Word Segmentation in Context
Gabriel Synnaeve | Isabelle Dautriche | Benjamin Börschinger | Mark Johnson | Emmanuel Dupoux
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Syllable weight encodes mostly the same information for English word segmentation as dictionary stress
John K Pate | Mark Johnson
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems
Bogdan Ludusan | Maarten Versteegh | Aren Jansen | Guillaume Gravier | Xuan-Nga Cao | Mark Johnson | Emmanuel Dupoux
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint. Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox, drawing ideas from both speech technology and natural language processing. We first transform the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, the quality of the clusters found, and the quality of the alignment with real words (type, token, and boundary scores). We tested our approach on two term discovery systems taking speech as input, and one using symbolic input. The latter was run using both the gold transcription and a transcription obtained from an automatic speech recognizer, in order to simulate the case when only imperfect symbolic information is available. The results obtained are analysed through the use of the proposed evaluation metrics and the implications of these metrics are discussed.

Modelling function words improves unsupervised word segmentation
Mark Johnson | Anne Christophe | Emmanuel Dupoux | Katherine Demuth
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars
Benjamin Börschinger | Mark Johnson
Transactions of the Association for Computational Linguistics, Volume 2

Stress has long been established as a major cue in word segmentation for English infants. We show that enabling a current state-of-the-art Bayesian word segmentation model to take advantage of stress cues noticeably improves its performance. We find that the improvements range from 10 to 4%, depending on both the use of phonotactic cues and, to a lesser extent, the amount of evidence available to the learner. We also find that in particular early on, stress cues are much more useful for our model than phonotactic cues by themselves, consistent with the finding that children do seem to use stress cues before they use phonotactic cues. Finally, we study how the model’s knowledge about stress patterns evolves over time. We not only find that our model correctly acquires the most frequent patterns relatively quickly but also that the Unique Stress Constraint that is at the heart of a previously proposed model does not need to be built in but can be acquired jointly with word segmentation.

Joint Incremental Disfluency Detection and Dependency Parsing
Matthew Honnibal | Mark Johnson
Transactions of the Association for Computational Linguistics, Volume 2

We present an incremental dependency parsing model that jointly performs disfluency detection. The model handles speech repairs using a novel non-monotonic transition system, and includes several novel classes of features. For comparison, we evaluated two pipeline systems, using state-of-the-art disfluency detectors. The joint model performed better on both tasks, with a parse accuracy of 90.5% and 84.0% accuracy at disfluency detection. The model runs in expected linear time, and processes over 550 tokens a second.

The Effect of Dependency Representation Scheme on Syntactic Language Modelling
Sunghwan Kim | John Pate | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2014

2013

Topic Segmentation with a Structured Topic Model
Lan Du | Wray Buntine | Mark Johnson
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The effect of non-tightness on Bayesian estimation of PCFGs
Shay B. Cohen | Mark Johnson
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A joint model of word segmentation and phonological variation for English word-final /t/-deletion
Benjamin Börschinger | Mark Johnson | Katherine Demuth
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning
Minh-Thang Luong | Michael C. Frank | Mark Johnson
Transactions of the Association for Computational Linguistics, Volume 1

Grounded language learning, the task of mapping from natural language to a representation of meaning, has attracted more and more interest in recent years. In most work on this topic, however, utterances in a conversation are treated independently and discourse structure information is largely ignored. In the context of language acquisition, this independence assumption discards cues that are important to the learner, e.g., the fact that consecutive utterances are likely to share the same referent (Frank et al., 2013). The current paper describes an approach to the problem of simultaneously modeling grounded language at the sentence and discourse levels. We combine ideas from parsing and grammar induction to produce a parser that can handle long input strings with thousands of tokens, creating parse trees that represent full discourses. By casting grounded language learning as a grammatical inference task, we use our parser to extend the work of Johnson et al. (2012), investigating the importance of discourse continuity in children’s language acquisition and its interaction with social cues. Our model boosts performance in a language acquisition task and yields good discourse segmentations compared with human annotators.

Modeling Graph Languages with Grammars Extracted via Tree Decompositions
Bevan Keeley Jones | Sharon Goldwater | Mark Johnson
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing

Why is English so easy to segment?
Abdellah Fourtassi | Benjamin Börschinger | Mark Johnson | Emmanuel Dupoux
Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL)

Grammars and Topic Models
Mark Johnson
Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13)

A Non-Monotonic Arc-Eager Transition System for Dependency Parsing
Matthew Honnibal | Yoav Goldberg | Mark Johnson
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

2012

Studying the Effect of Input Size for Bayesian Word Segmentation on the Providence Corpus
Benjamin Börschinger | Katherine Demuth | Mark Johnson
Proceedings of COLING 2012

Improving Combinatory Categorial Grammar Parse Reranking with Dependency Grammar Features
Sunghwan Mac Kim | Dominick Ng | Mark Johnson | James Curran
Proceedings of COLING 2012

Exploring Adaptor Grammars for Native Language Identification
Sze-Meng Jojo Wong | Mark Dras | Mark Johnson
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Semantic Parsing with Bayesian Tree Transducers
Bevan Jones | Mark Johnson | Sharon Goldwater
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Exploiting Social Information in Grounded Language Learning via Grammatical Reduction
Mark Johnson | Katherine Demuth | Michael Frank
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation
Benjamin Börschinger | Mark Johnson
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology
Pushpak Bhattacharyya | Asif Ekbal | Sriparna Saha | Mark Johnson | Diego Molla-Aliod | Mark Dras
Proceedings of the First International Workshop on Optimization Techniques for Human Language Technology

2011

Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
Regina Barzilay | Mark Johnson
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

Reducing Grounded Learning Tasks To Grammatical Inference
Benjamin Börschinger | Bevan K. Jones | Mark Johnson
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

The impact of language models and loss functions on repair disfluency detection
Simon Zwarts | Mark Johnson
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

A Particle Filter algorithm for Bayesian Wordsegmentation
Benjamin Börschinger | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2011

Formalizing Semantic Parsing with Tree Transducers
Bevan Jones | Mark Johnson | Sharon Goldwater
Proceedings of the Australasian Language Technology Association Workshop 2011

Parsing in Parallel on Multiple Cores and GPUs
Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2011

Using Language Models and Latent Semantic Analysis to Characterise the N400m Neural Response
Mehdi Parviz | Mark Johnson | Blake Johnson | Jon Brock
Proceedings of the Australasian Language Technology Association Workshop 2011

Topic Modeling for Native Language Identification
Sze-Meng Jojo Wong | Mark Dras | Mark Johnson
Proceedings of the Australasian Language Technology Association Workshop 2011

2010

Unsupervised phonemic Chinese word segmentation using Adaptor Grammars
Mark Johnson | Katherine Demuth
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

Detecting Speech Repairs Incrementally Using a Noisy Channel Approach
Simon Zwarts | Mark Johnson | Robert Dale
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

Using Universal Linguistic Knowledge to Guide Grammar Induction
Tahira Naseem | Harr Chen | Regina Barzilay | Mark Johnson
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Automatic Domain Adaptation for Parsing
David McClosky | Eugene Charniak | Mark Johnson
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Learning Words and Their Meanings from Unsegmented Child-directed Speech
Bevan K. Jones | Mark Johnson | Michael C. Frank
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Reranking the Berkeley and Brown Parsers
Mark Johnson | Ahmet Engin Ural
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names
Mark Johnson
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

SVD and Clustering for Unsupervised POS Tagging
Michael Lamar | Yariv Maron | Mark Johnson | Elie Bienenstock
Proceedings of the ACL 2010 Conference Short Papers

Repurposing Corpora for Speech Repair Detection: Two Experiments
Simon Zwarts | Mark Johnson | Robert Dale
Proceedings of the Australasian Language Technology Association Workshop 2010

2009

Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
William P. Headden III | Mark Johnson | David McClosky
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Structured Generative Models for Unsupervised Named-Entity Clustering
Micha Elsner | Eugene Charniak | Mark Johnson
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars
Mark Johnson | Sharon Goldwater
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

A Note on the Implementation of Hierarchical Dirichlet Processes
Phil Blunsom | Trevor Cohn | Sharon Goldwater | Mark Johnson
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

How the Statistical Revolution Changes (Computational) Linguistics
Mark Johnson
Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?

2008

When is Self-Training Effective for Parsing?
David McClosky | Eugene Charniak | Mark Johnson
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers
Jianfeng Gao | Mark Johnson
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
Mark Johnson
Proceedings of ACL-08: HLT

Unsupervised Word Segmentation for Sesotho Using Adaptor Grammars
Mark Johnson
Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology

2007

Why Doesn’t EM Find Good HMM POS-Taggers?
Mark Johnson
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

Weighted and Probabilistic Context-Free Grammars Are Equally Expressive
Noah A. Smith | Mark Johnson
Computational Linguistics, Volume 33, Number 4, December 2007

Bayesian Inference for PCFGs via Markov Chain Monte Carlo
Mark Johnson | Thomas Griffiths | Sharon Goldwater
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing
Jianfeng Gao | Galen Andrew | Mark Johnson | Kristina Toutanova
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold
Mark Johnson
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

While both spoken and written language processing stand to benefit from parsing, the standard Parseval metrics (Black et al., 1991) and their canonical implementation (Sekine and Collins, 1997) are only useful for text. The Parseval metrics are undefined when the words input to the parser do not match the words in the gold standard parse tree exactly, and word errors are unavoidable with automatic speech recognition (ASR) systems. To fill this gap, we have developed a publicly available tool for scoring parses that implements a variety of metrics which can handle mismatches in words and segmentations, including: alignment-based bracket evaluation, alignment-based dependency evaluation, and a dependency evaluation that does not require alignment. We describe the different metrics, how to use the tool, and the outcome of an extensive set of experiments on the sensitivity.

Effective Self-Training for Parsing
David McClosky | Eugene Charniak | Mark Johnson
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

Multilevel Coarse-to-Fine PCFG Parsing
Eugene Charniak | Mark Johnson | Micha Elsner | Joseph Austerweil | David Ellis | Isaac Haxton | Catherine Hill | R. Shrivaths | Jeremy Moore | Michael Pozar | Theresa Vu
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

Early Deletion of Fillers In Processing Conversational Speech
Matthew Lease | Mark Johnson
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

Reranking and Self-Training for Parser Adaptation
David McClosky | Eugene Charniak | Mark Johnson
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Contextual Dependencies in Unsupervised Word Segmentation
Sharon Goldwater | Thomas L. Griffiths | Mark Johnson
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Learning Phrasal Categories
William P. Headden III | Eugene Charniak | Mark Johnson
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

Effective Use of Prosody in Parsing Conversational Speech
Jeremy G. Kahn | Matthew Lease | Eugene Charniak | Mark Johnson | Mari Ostendorf
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
Eugene Charniak | Mark Johnson
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

Representational Bias in Unsupervised Learning of Syllable Structure
Sharon Goldwater | Mark Johnson
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

Sentence-Internal Prosody Does not Help Parsing the Way Punctuation Does
Michelle Gregory | Mark Johnson | Eugene Charniak
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

A TAG-based noisy-channel model of speech repairs
Mark Johnson | Eugene Charniak
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

Attention Shifting for Parsing Speech
Keith B. Hall | Mark Johnson
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm
Brian Roark | Murat Saraclar | Michael Collins | Mark Johnson
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

Priors in Bayesian Learning of Phonological Rules
Sharon Goldwater | Mark Johnson
Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology

Multi-component Word Sense Disambiguation
Massimiliano Ciaramita | Mark Johnson
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences
Yasemin Altun | Mark Johnson | Thomas Hofmann
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

Supersense Tagging of Unknown Nouns in WordNet
Massimiliano Ciaramita | Mark Johnson
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

2002

Squibs and Discussions: The DOP Estimation Method is Biased and Inconsistent
Mark Johnson
Computational Linguistics, Volume 28, Number 1, March 2002

A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents
Mark Johnson
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques
Stefan Riezler | Tracy H. King | Ronald M. Kaplan | Richard Crouch | John T. Maxwell III | Mark Johnson
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

Dynamic programming for parsing and estimation of stochastic unification-based grammars
Stuart Geman | Mark Johnson
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

Parsing and Disfluency Placement
Donald Engel | Eugene Charniak | Mark Johnson
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2001

Edit Detection and Parsing for Transcribed Speech
Eugene Charniak | Mark Johnson
Second Meeting of the North American Chapter of the Association for Computational Linguistics

Joint and Conditional Estimation of Tagging and Parsing Models
Mark Johnson
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

Exploiting auxiliary distributions in stochastic unification-based grammars
Mark Johnson | Stefan Riezler
1st Meeting of the North American Chapter of the Association for Computational Linguistics

Explaining away ambiguity: Learning verb selectional preference with Bayesian networks
Massimiliano Ciaramita | Mark Johnson
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

Compact non-left-recursive grammars using the selective left-corner transform and factoring
Mark Johnson | Brian Roark
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training
Stefan Riezler | Detlef Prescher | Jonas Kuhn | Mark Johnson
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

Efficient probabilistic top-down and left-corner parsing
Brian Roark | Mark Johnson
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

Estimators for Stochastic “Unification-Based” Grammars
Mark Johnson | Stuart Geman | Stephen Canon | Zhiyi Chi | Stefan Riezler
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

Finite-state Approximation of Constraint-based Grammars using Left-corner Grammar Transforms
Mark Johnson
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

PCFG Models of Linguistic Tree Representations
Mark Johnson
Computational Linguistics, Volume 24, Number 4, December 1998

Finite-state Approximation of Constraint-based Grammars using Left-corner Grammar Transforms
Mark Johnson
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

Edge-Based Best-First Chart Parsing
Eugene Charniak | Sharon Goldwater | Mark Johnson
Sixth Workshop on Very Large Corpora

The effect of alternative tree representations on tree bank grammars
Mark Johnson
New Methods in Language Processing and Computational Natural Language Learning

1995

Squibs and Discussions: Memoization in Top-Down Parsing
Mark Johnson
Computational Linguistics, Volume 21, Number 3, September 1995

Features and Agreement
Sam Bayer | Mark Johnson
33rd Annual Meeting of the Association for Computational Linguistics

Memoization of Coroutined Constraints
Mark Johnson | Jochen Dorre
33rd Annual Meeting of the Association for Computational Linguistics

1994

Computing with Features as Formulae
Mark Johnson
Computational Linguistics, Volume 20, Number 1, March 1994

Parsing and empty nodes
Mark Johnson | Martin Kay
Computational Linguistics, Volume 20, Number 2, June 1994

1991

Features and Formulae
Mark Johnson
Computational Linguistics, Volume 17, Number 2, June 1991

1990

Semantic Abstraction and Anaphora
Mark Johnson | Martin Kay
COLING 1990 Volume 1: Papers presented to the 13th International Conference on Computational Linguistics

Expressing Disjunctive and Negative Feature Constraints With Classical First-Order Logic.
Mark Johnson
28th Annual Meeting of the Association for Computational Linguistics

1989

The Computational Complexity of Tomita’s Algorithm
Mark Johnson
Proceedings of the First International Workshop on Parsing Technologies

1988

Deductive Parsing With Multiple Levels of Representation.
Mark Johnson
26th Annual Meeting of the Association for Computational Linguistics

1986

Discourse, anaphora and parsing
Mark Johnson | Ewan Klein
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics

1985

Parsing with Discontinuous Constituents
Mark Johnson
23rd Annual Meeting of the Association for Computational Linguistics

1984

A Discovery Procedure for Certain Phonological Rules
Mark Johnson
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics

Co-authors

Dat Quoc Nguyen 9

Massimiliano Ciaramita 5

Katherine Demuth 5

Emmanuel Dupoux 5

Mohammad Javad Hosseini 5

Paria Jamshid Lou 5

David McClosky 5

Shay B. Cohen 4

Stefan Riezler 4

Kristina Toutanova 4

Peter Anderson 3

Sander Bijl de Vroe 3

Michael C. Frank 3

Jonas Groschwitz 3

Liane Guillou 3

Matthew Honnibal 3

Alexander Koller 3

Matthew Lease 3

Dai Quoc Nguyen 3

Regina Barzilay 2

Philip R. Cohen 2

Dominique Estival 2

Meaghan Fowlie 2

Thomas L. Griffiths 2

William P. Headden III 2

Jeremy G. Kahn 2

Sunghwan Mac Kim 2

Mari Ostendorf 2

Sze-Meng Jojo Wong 2

Yasemin Altun 1

Bharat Ram Ambati 1

Philip Arthur 1

Joseph Austerweil 1

Guy Bashkansky 1

Pushpak Bhattacharyya 1

Elie Bienenstock 1

Richard Billingsley 1

Andrew Bleeker 1

Stephen Canon 1

Nathanael Chambers 1

Anne Christophe 1

Michael Collins 1

Richard Crouch 1

James R. Curran 1

Isabelle Dautriche 1

Tejaswini Deoskar 1

Don Dharmasiri 1

Vladislavs Dovgalecs 1

Basura Fernando 1

Abdellah Fourtassi 1

Yoav Goldberg 1

Stephen Gould 1

Guillaume Gravier 1

Michelle Gregory 1

Catherine Hill 1

Cong Duy Vu Hoang 1

Vu Cong Duy Hoang 1

Thomas Hofmann 1

Xavier R. Holt 1

Mohammad Hosseini 1

Serge Le Huitouze 1

Blake Johnson 1

Ronald M. Kaplan 1

Krishnaram Kenthapadi 1

Tracy Holloway King 1

Anna Krasnyanskaya 1

Michael Lamar 1

Matthias Lindemann 1

Yang Liu (刘扬) 1

Bogdan Ludusan 1

Minh-Thang Luong 1

Shervin Malmasi 1

John T. Maxwell III 1

Mahdi Kazemi Moghaddam 1

Tahira Naseem 1

Duc Thien Nguyen 1

Tuyen Quang Pham 1

Olivier Piguet 1

Michael Pozar 1

Detlef Prescher 1

Sriparna Saha 1

Murat Saraclar 1

Izhak Shafran 1

Noah A. Smith 1

Matthew Snover 1

Robert Staubs 1

Robin Stewart 1

Gabriel Synnaeve 1

Gioacchino Tangari 1

Ahmet Engin Ural 1

Maarten Versteegh 1

Magdalena Wolska 1

Zhendong Zhao 1

Venues