pdf
bib
Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA-19)
Harry Bunt
pdf
bib
abs
The DARPA Wikidata Overlay: Wikidata as an ontology for natural language processing
Elizabeth Spaulding
|
Kathryn Conger
|
Anatole Gershman
|
Rosario Uceda-Sosa
|
Susan Windisch Brown
|
James Pustejovsky
|
Peter Anick
|
Martha Palmer
With 102,530,067 items currently in its crowd-sourced knowledge base, Wikidata provides NLP practitioners a unique and powerful resource for inference and reasoning over real-world entities. However, because Wikidata is very entity focused, events and actions are often labeled with eventive nouns (e.g., the process of diagnosing a person’s illness is labeled “diagnosis”), and the typical participants in an event are not described or linked to that event concept (e.g., the medical professional or patient). Motivated by a need for an adaptable, comprehensive, domain-flexible ontology for information extraction, including identifying the roles entities are playing in an event, we present a curated subset of Wikidata in which events have been enriched with PropBank roles. To enable richer narrative understanding between events from Wikidata concepts, we have also provided a comprehensive mapping from temporal Qnodes and Pnodes to the Allen Interval Temporal Logic relations.
pdf
bib
abs
Semantic annotation of Common Lexis Verbs of Contact in Bulgarian
Maria Todorova
The paper presents the work on the selection, semantic annotation and classification of a group of verbs from WordNet, characterized with the semantic primitive ‘verbs of contact’ that belong to the common Bulgarian lexis. The selection of the verb set using both different criteria: statistical information from corpora, WordNet Base concepts and AoA as a criterion, is described. The focus of the work is on the process of the verbs’ of contact semantic annotation using the combined information from two language resources - WordNet and FrameNet. The verbs of contact from WordNet are assigmed semantic frames from FrameNet and then grouped in semantic subclasses using both their place in the WordNet hierarchy, the semantic restrictions on their frame elements and the corresponding syntactic realization. At the end we offer some conclusions on the classification of ‘verbs of contact’ in semantic subtypes.
pdf
abs
Appraisal Theory and the Annotation of Speaker-Writer Engagement
Min Dong
|
Alex Fang
In this work, we address the annotation of language resources through the application of the engagement network in appraisal theory. This work represents an attempt to extend the advances in studies of speech and dialogue acts to encompass the latest notion of stance negotiations in discourse, between the writer and other sources. This type of phenomenon has become especially salient in contemporary media communication and requires some timely research to address emergent requirement. We shall first of all describe the engagement network as proposed by Martin and White (2005) and then discuss the issue of multisubjectivity. We shall then propose and describe a bi-step procedure towards better annotation before discussing the benefits of engagement network in the assessment of speaker-writer stance. We shall finally discuss issues of annotation consistency and reliability.
pdf
abs
metAMoRphosED, a graphical editor for Abstract Meaning Representation
Johannes Heinecke
|
Maria Boritchev
This paper presents a graphical editor for directed graphs, serialised in the PENMAN format, as used for annotations in Abstract Meaning Representation (AMR). The tool supports creating and modifying of AMR graphs and other directed graphs, adding and deletion of instances, edges and literals, renaming of concepts, relations and literals, setting a “top node” and validating the edited graph.
pdf
abs
Personal noun detection for German
Carla Sökefeld
|
Melanie Andresen
|
Johanna Binnewitt
|
Heike Zinsmeister
Personal nouns, i.e. common nouns denoting human beings, play an important role in manifesting gender and gender stereotypes in texts, especially for languages with grammatical gender like German. Automatically detecting and extracting personal nouns can thus be of interest to a myriad of different tasks such as minimizing gender bias in language models and researching gender stereotypes or gender-fair language, but is complicated by the morphological heterogeneity and homonymy of personal and non-personal nouns, which restrict lexicon-based approaches. In this paper, we introduce a classifier created by fine-tuning a transformer model that detects personal nouns in German. Although some phenomena like homonymy and metalinguistic uses are still problematic, the model is able to classify personal nouns with robust accuracy (f1-score: 0.94).
pdf
abs
ISO 24617-2 on a cusp of languages
Krzysztof Hwaszcz
|
Marcin Oleksy
|
Aleksandra Domogała
|
Jan Wieczorek
The article discusses the challenges of cross-linguistic dialogue act annotation, which involves using methods developed for one language to annotate conversations in another language. The article specifically focuses on the research on dialogue act annotation in Polish, based on the ISO standard developed for English. The article examines the differences between Polish and English in dialogue act annotation based on selected examples from DiaBiz.Kom corpus, such as the use of honorifics in Polish, the use of inflection to convey meaning in Polish, the tendency to use complex sentence structures in Polish, and the cultural differences that may play a role in the annotation of dialogue acts. The article also discusses the creation of DiaBiz.Kom, a Polish dialogue corpus based on ISO 24617-2 standard applied to 1100 transcripts.
pdf
abs
Towards Referential Transparent Annotations of Quantified Noun Phrases
Andy Luecking
Using recent developments in count noun quantification, namely Referential Transparency Theory (RTT), the basic structure for annotating quantification in the nominal domain according to RTT is presented. The paper discusses core ideas of RTT, derives the abstract annotation syntax, and exemplifies annotations of quantified noun phrases partly in comparison to QuantML.
pdf
abs
The compositional semantics of QuantML annotations
Harry Bunt
This paper discusses some issues in the semantic annotation of quantification phenomena in general, and in particular in the markup language QuantML, which has been proposed to form part of an ISO standard annotation scheme for quantification in natural language data. QuantML annotations have been claimed to have a compositional semantic interpretation, but the formal specification of QuantML in the official ISO documentation does not provide sufficient detail to judge this. This paper aims to fill this gap.
pdf
abs
An Abstract Specification of VoxML as an Annotation Language
Kiyong Lee
|
Nikhil Krishnaswamy
|
James Pustejovsky
VoxML is a modeling language used to map natural language expressions into real time visualizations using real-world semantic knowledge of objects and events. Its utility has been demonstrated in embodied simulation environmens and in agent-object interactions in situated human-agent communicative. It is enriched to work with notions of affordances, both Gibsonian and Telic, and habitat for various interactions between the rational agent (human) and an object. This paper aims to specify VoxML as an annotation language in general abstract terms. It then shows how it works on annotating linguistic data that express visually perceptible human-object interactions. The annotation structures thus generated will be interpreted against the enriched minimal model created by VoxML as a modeling language while supporting the modeling purposes of VoxML linguistically.
pdf
abs
How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?
Corbyn Terpstra
|
Ibrahim Khebour
|
Mariah Bradford
|
Brett Wisniewski
|
Nikhil Krishnaswamy
|
Nathaniel Blanchard
In this work, we assess the quality of different utterance segmentation techniques as an aid in annotating collaborative problem solving in teams and the creation of shared meaning between participants in a situated, collaborative task. We manually transcribe utterances in a dataset of triads collaboratively solving a problem involving dialogue and physical object manipulation, annotate collaborative moves according to these gold-standard transcripts, and then apply these annotations to utterances that have been automatically segmented using toolkits from Google and Open-AI’s Whisper. We show that the oracle utterances have minimal correspondence to automatically segmented speech, and that automatically segmented speech using different segmentation methods is also inconsistent. We also show that annotating automatically segmented speech has distinct implications compared with annotating oracle utterances — since most annotation schemes are designed for oracle cases, when annotating automatically-segmented utterances, annotators must make arbitrary judgements which other annotators may not replicate. We conclude with a discussion of how future annotation specs can account for these needs.