2024
pdf
abs
Bootstrapping UMR Annotations for Arapaho from Language Documentation Resources
Matthew J. Buchholz
|
Julia Bonn
|
Claire Benet Post
|
Andrew Cowell
|
Alexis Palmer
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Uniform Meaning Representation (UMR) is a semantic labeling system in the AMR family designed to be uniformly applicable to typologically diverse languages. The UMR labeling system is quite thorough and can be time-consuming to execute, especially if annotators are starting from scratch. In this paper, we focus on methods for bootstrapping UMR annotations for a given language from existing resources, and specifically from typical products of language documentation work, such as lexical databases and interlinear glossed text (IGT). Using Arapaho as our test case, we present and evaluate a bootstrapping process that automatically generates UMR subgraphs from IGT. Additionally, we describe and evaluate a method for bootstrapping valency lexicon entries from lexical databases for both the target language and English. We are able to generate enough basic structure in UMR graphs from the existing Arapaho interlinearized texts to automate UMR labeling to a significant extent. Our method thus has the potential to streamline the process of building meaning representations for new languages without existing large-scale computational resources.
pdf
abs
Building a Broad Infrastructure for Uniform Meaning Representations
Julia Bonn
|
Matthew J. Buchholz
|
Jayeol Chun
|
Andrew Cowell
|
William Croft
|
Lukas Denk
|
Sijia Ge
|
Jan Hajič
|
Kenneth Lai
|
James H. Martin
|
Skatje Myers
|
Alexis Palmer
|
Martha Palmer
|
Claire Benet Post
|
James Pustejovsky
|
Kristine Stenzel
|
Haibo Sun
|
Zdeňka Urešová
|
Rosa Vallejos
|
Jens E. L. Van Gysel
|
Meagan Vigus
|
Nianwen Xue
|
Jin Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper reports the first release of the UMR (Uniform Meaning Representation) data set. UMR is a graph-based meaning representation formalism consisting of a sentence-level graph and a document-level graph. The sentence-level graph represents predicate-argument structures, named entities, word senses, aspectuality of events, as well as person and number information for entities. The document-level graph represents coreferential, temporal, and modal relations that go beyond sentence boundaries. UMR is designed to capture the commonalities and variations across languages and this is done through the use of a common set of abstract concepts, relations, and attributes as well as concrete concepts derived from words from invidual languages. This UMR release includes annotations for six languages (Arapaho, Chinese, English, Kukama, Navajo, Sanapana) that vary greatly in terms of their linguistic properties and resource availability. We also describe on-going efforts to enlarge this data set and extend it to other genres and modalities. We also briefly describe the available infrastructure (UMR annotation guidelines and tools) that others can use to create similar data sets.
2023
pdf
abs
Mapping AMR to UMR: Resources for Adapting Existing Corpora for Cross-Lingual Compatibility
Julia Bonn
|
Skatje Myers
|
Jens E. L. Van Gysel
|
Lukas Denk
|
Meagan Vigus
|
Jin Zhao
|
Andrew Cowell
|
William Croft
|
Jan Hajič
|
James H. Martin
|
Alexis Palmer
|
Martha Palmer
|
James Pustejovsky
|
Zdenka Urešová
|
Rosa Vallejos
|
Nianwen Xue
Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)
This paper presents detailed mappings between the structures used in Abstract Meaning Representation (AMR) and those used in Uniform Meaning Representation (UMR). These structures include general semantic roles, rolesets, and concepts that are largely shared between AMR and UMR, but with crucial differences. While UMR annotation of new low-resource languages is ongoing, AMR-annotated corpora already exist for many languages, and these AMR corpora are ripe for conversion to UMR format. Rather than focusing on semantic coverage that is new to UMR (which will likely need to be dealt with manually), this paper serves as a resource (with illustrated mappings) for users looking to understand the fine-grained adjustments that have been made to the representation techniques for semantic categoriespresent in both AMR and UMR.
pdf
abs
UMR Annotation of Multiword Expressions
Julia Bonn
|
Andrew Cowell
|
Jan Hajič
|
Alexis Palmer
|
Martha Palmer
|
James Pustejovsky
|
Haibo Sun
|
Zdenka Uresova
|
Shira Wein
|
Nianwen Xue
|
Jin Zhao
Proceedings of the Fourth International Workshop on Designing Meaning Representations
Rooted in AMR, Uniform Meaning Representation (UMR) is a graph-based formalism with nodes as concepts and edges as relations between them. When used to represent natural language semantics, UMR maps words in a sentence to concepts in the UMR graph. Multiword expressions (MWEs) pose a particular challenge to UMR annotation because they deviate from the default one-to-one mapping between words and concepts. There are different types of MWEs which require different kinds of annotation that must be specified in guidelines. This paper discusses the specific treatment for each type of MWE in UMR.
2021
pdf
bib
abs
Theoretical and Practical Issues in the Semantic Annotation of Four Indigenous Languages
Jens E. L. Van Gysel
|
Meagan Vigus
|
Lukas Denk
|
Andrew Cowell
|
Rosa Vallejos
|
Tim O’Gorman
|
William Croft
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop
Computational resources such as semantically annotated corpora can play an important role in enabling speakers of indigenous minority languages to participate in government, education, and other domains of public life in their own language. However, many languages – mainly those with small native speaker populations and without written traditions – have little to no digital support. One hurdle in creating such resources is that for many languages, few speakers would be capable of annotating texts – a task which requires literacy and some linguistic training – and that these experts’ time is typically in high demand for language planning work. This paper assesses whether typologically trained non-speakers of an indigenous language can feasibly perform semantic annotation using Uniform Meaning Representations, thus allowing for the creation of computational materials without putting further strain on community resources.
2020
pdf
abs
Cross-lingual annotation: a road map for low- and no-resource languages
Meagan Vigus
|
Jens E. L. Van Gysel
|
Tim O’Gorman
|
Andrew Cowell
|
Rosa Vallejos
|
William Croft
Proceedings of the Second International Workshop on Designing Meaning Representations
This paper presents a “road map” for the annotation of semantic categories in typologically diverse languages, with potentially few linguistic resources, and often no existing computational resources. Past semantic annotation efforts have focused largely on high-resource languages, or relatively low-resource languages with a large number of native speakers. However, there are certain typological traits, namely the synthesis of multiple concepts into a single word, that are more common in languages with a smaller speech community. For example, what is expressed as a sentence in a more analytic language like English, may be expressed as a single word in a more synthetic language like Arapaho. This paper proposes solutions for annotating analytic and synthetic languages in a comparable way based on existing typological research, and introduces a road map for the annotation of languages with a dearth of resources.
2019
pdf
Improving Low-Resource Morphological Learning with Intermediate Forms from Finite State Transducers
Sarah Moeller
|
Ghazaleh Kazeminejad
|
Andrew Cowell
|
Mans Hulden
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
2018
pdf
bib
abs
A Neural Morphological Analyzer for Arapaho Verbs Learned from a Finite State Transducer
Sarah Moeller
|
Ghazaleh Kazeminejad
|
Andrew Cowell
|
Mans Hulden
Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages
We experiment with training an encoder-decoder neural model for mimicking the behavior of an existing hand-written finite-state morphological grammar for Arapaho verbs, a polysynthetic language with a highly complex verbal inflection system. After adjusting for ambiguous parses, we find that the system is able to generalize to unseen forms with accuracies of 98.68% (unambiguous verbs) and 92.90% (all verbs).
2017
pdf
bib
Creating lexical resources for polysynthetic languages—the case of Arapaho
Ghazaleh Kazeminejad
|
Andrew Cowell
|
Mans Hulden
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages
2016
pdf
Applying Universal Dependency to the Arapaho Language
Irina Wagner
|
Andrew Cowell
|
Jena D. Hwang
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)
2006
pdf
Retrospective Analysis of Communication Events - Understanding the Dynamics of Collaborative Multi-Party Discourse
Andrew Cowell
|
Jereme Haack
|
Adrienne Andrew
Proceedings of the Analyzing Conversations in Text and Speech