Adam Pease

2018

pdf bib abs
Toward a Semantic Concordancer
Adam Pease | Andrew Cheung
Proceedings of the 9th Global Wordnet Conference

Concordancers are an accepted and valuable part of the tool set of linguists and lexicographers. They allow the user to see the context of use of a word or phrase in a corpus. A large enough corpus, such as the Corpus Of Contemporary American English, provides the data needed to enumerate all common uses or meanings. One challenge is that there may be too many results for short search phrases or common words when only a specific context is desired. However, finding meaningful groupings of usage may be impractical if it entails enumerating long lists of possible values, such as city names. If a tool existed that could create some semantic abstractions, it would free the lexicographer from the need to resort to customized development of analysis software. To address this need, we have developed a Semantic Concordancer that uses dependency parsing and the Suggested Upper Merged Ontology (SUMO) to support linguistic analysis at a level of semantic abstraction above the original textual elements. We show how this facility can be employed to analyze the use of English prepositions by non-native speakers. We briefly introduce condordancers and then describe the corpora on which we applied this work. Next we provide a detailed description of the NLP pipeline followed by how this captures detailed semantics. We show how the semantics can be used to analyze errors in the use of English prepositions by non-native speakers of English. Then we provide a description of a tool that allows users to build semantic search specifications from a set of English examples and how those results can be employed to build rules that translate sentences into logical forms. Finally, we summarize our conclusions and mention future work.

2016

pdf bib abs
Word Substitution in Short Answer Extraction: A WordNet-based Approach
Qingqing Cai | James Gung | Maochen Guan | Gerald Kurlandski | Adam Pease
Proceedings of the 8th Global WordNet Conference (GWC)

We describe the implementation of a short answer extraction system. It consists of a simple sentence selection front-end and a two phase approach to answer extraction from a sentence. In the first phase sentence classification is performed with a classifier trained with the passive aggressive algorithm utilizing the UIUC dataset and taxonomy and a feature set including word vectors. This phase outperforms the current best published results on that dataset. In the second phase, a sieve algorithm consisting of a series of increasingly general extraction rules is applied, using WordNet to find word types aligned with the UIUC classifications determined in the first phase. Some very preliminary performance metrics are presented.

2006

Arabic WordNet is a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Arabic WordNet (AWN) is based on the design and contents of the universally accepted Princeton WordNet (PWN) and will be mappable straightforwardly onto PWN 2.0 and EuroWordNet (EWN), enabling translation on the lexical level to English and dozens of other languages. We have developed and linked the AWN with the Suggested Upper Merged Ontology (SUMO), where concepts are defined with machine interpretable semantics in first order logic (Niles and Pease, 2001). We have greatly extended the ontology and its set of mappings to provide formal terms and definitions for each synset. The end product would be a linguistic resource with a deep formal semantic foundation that is able to capture the richness of Arabic as described in Elkateb (2005). Tools we have developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004). In this paper we describe our methodology for building a lexical resource in Arabic and the challenge of Arabic for lexical resources.

This paper introduces a recently initiated project that focuses on building a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Our aim is to develop a linguistic resource with a deep formal semantic foundation in order to capture the richness of Arabic as described in Elkateb (2005). Arabic WordNet is being constructed following methods developed for EuroWordNet (Vossen, 1998). In addition to the standard wordnet representation of senses, word meanings are also being defined with a machine understandable semantics in first order logic. The basis for this semantics is the Suggested Upper Merged Ontology and its associated domain ontologies (Niles and Pease, 2001). We will greatly extend the ontology and its set of mappings to provide formal terms and definitions for each synset. Tools to be developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004).