Michael Wayne Goodman

Also published as: Michael Goodman

2021

The Global Wordnet Formats have been introduced to enable wordnets to have a common representation that can be integrated through the Global WordNet Grid. As a result of their adoption, a number of shortcomings of the format were identified, and in this paper we describe the extensions to the formats that address these issues. These include: ordering of senses, dependencies between wordnets, pronunciation, syntactic modelling, relations, sense keys, metadata and RDF support. Furthermore, we provide some perspectives on how these changes help in the integration of wordnets.

pdf abs
Intrinsically Interlingual: The Wn Python Library for Wordnets
Michael Wayne Goodman | Francis Bond
Proceedings of the 11th Global Wordnet Conference

This paper introduces Wn, a new Python library for working with wordnets. Unlike previous libraries, Wn is built from the beginning to accommodate multiple wordnets — for multiple languages or multiple versions of the same wordnet — while retaining the ability to query and traverse them independently. It is also able to download and incorporate wordnets published online. These features are made possible through Wn’s adoption of standard formats and methods for interoperability, namely the WN-LMF schema (Vossen et al., 2013; Bond et al., 2020) and the Collaborative Interlingual Index (Bond et al., 2016). Wn is open-source, easily available, and well-documented.

2020

pdf abs
Some Issues with Building a Multilingual Wordnet
Francis Bond | Luis Morgado da Costa | Michael Wayne Goodman | John Philip McCrae | Ahti Lohk
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we discuss the experience of bringing together over 40 different wordnets. We introduce some extensions to the GWA wordnet LMF format proposed in Vossen et al. (2016) and look at how this new information can be displayed. Notable extensions include: confidence, corpus frequency, orthographic variants, lexicalized and non-lexicalized synsets and lemmas, new parts of speech, and more. Many of these extensions already exist in multiple wordnets – the challenge was to find a compatible representation. To this end, we introduce a new version of the Open Multilingual Wordnet (Bond and Foster, 2013), that integrates a new set of tools that tests the extensions introduced by this new format, while also ensuring the integrity of the Collaborative Interlingual Index (CILI: Bond et al., 2016), avoiding the same new concept to be introduced through multiple projects.

pdf abs
Penman: An Open-Source Library and Tool for AMR Graphs
Michael Wayne Goodman
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

Abstract Meaning Representation (AMR) (Banarescu et al., 2013) is a framework for semantic dependencies that encodes its rooted and directed acyclic graphs in a format called PENMAN notation. The format is simple enough that users of AMR data often write small scripts or libraries for parsing it into an internal graph representation, but there is enough complexity that these users could benefit from a more sophisticated and well-tested solution. The open-source Python library Penman provides a robust parser, functions for graph inspection and manipulation, and functions for formatting graphs into PENMAN notation. Many functions are also available in a command-line tool, thus extending its utility to non-Python setups.

2019

pdf abs
Neural Text Generation from Rich Semantic Representations
Valerie Hajdik | Jan Buys | Michael Wayne Goodman | Emily M. Bender
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose neural models to generate high-quality text from structured representations based on Minimal Recursion Semantics (MRS). MRS is a rich semantic representation that encodes more precise semantic detail than other representations such as Abstract Meaning Representation (AMR). We show that a sequence-to-sequence model that maps a linearization of Dependency MRS, a graph-based representation of MRS, to text can achieve a BLEU score of 66.11 when trained on gold data. The performance of the model can be improved further using a high-precision, broad coverage grammar-based parser to generate a large silver training corpus, achieving a final BLEU score of 77.17 on the full test set, and 83.37 on the subset of test data most closely matching the silver data domain. Our results suggest that MRS-based representations are a good choice for applications that need both structured semantics and the ability to produce natural language text as output.

2018

pdf
PDF-to-Text Reanalysis for Linguistic Data Mining
Michael Wayne Goodman | Ryan Georgi | Fei Xia
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
UW-Stanford System Description for AESW 2016 Shared Task on Grammatical Error Detection
Dan Flickinger | Michael Goodman | Woodley Packard
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf
A Web-framework for ODIN Annotation
Ryan Georgi | Michael Wayne Goodman | Fei Xia
Proceedings of ACL-2016 System Demonstrations

pdf abs
Resources for building applications with Dependency Minimal Recursion Semantics
Ann Copestake | Guy Emerson | Michael Wayne Goodman | Matic Horvat | Alexander Kuhnle | Ewa Muszyńska
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe resources aimed at increasing the usability of the semantic representations utilized within the DELPH-IN (Deep Linguistic Processing with HPSG) consortium. We concentrate in particular on the Dependency Minimal Recursion Semantics (DMRS) formalism, a graph-based representation designed for compositional semantic representation with deep grammars. Our main focus is on English, and specifically English Resource Semantics (ERS) as used in the English Resource Grammar. We first give an introduction to ERS and DMRS and a brief overview of some existing resources and then describe in detail a new repository which has been developed to simplify the use of ERS/DMRS. We explain a number of operations on DMRS graphs which our repository supports, with sketches of the algorithms, and illustrate how these operations can be exploited in application building. We believe that this work will aid researchers to exploit the rich and effective but complex DELPH-IN resources.

2014

pdf
Learning Grammar Specifications from IGT: A Case Study of Chintang
Emily M. Bender | Joshua Crowgey | Michael Wayne Goodman | Fei Xia
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf abs
Enriching ODIN
Fei Xia | William Lewis | Michael Wayne Goodman | Joshua Crowgey | Emily M. Bender
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we describe the expansion of the ODIN resource, a database containing many thousands of instances of Interlinear Glossed Text (IGT) for over a thousand languages harvested from scholarly linguistic papers posted to the Web. A database containing a large number of instances of IGT, which are effectively richly annotated and heuristically aligned bitexts, provides a unique resource for bootstrapping NLP tools for resource-poor languages. To make the data in ODIN more readily consumable by tool developers and NLP researchers, we propose a new XML format for IGT, called Xigt. We call the updated release ODIN-II.