Harry Bunt

Also published as: H. C. Bunt

2025

pdf bib abs
Representing ISO-Annotated Dynamic Information in UMR
Kiyong Lee | Harry Bunt | James Pustejovsky | Alex C. Fang | Chongwon Park
Proceedings of the Sixth International Workshop on Designing Meaning Representations

The ISO working group on semantic annotation aims to adopt the UMR formalism to represent dynamic information involving motions and their embedding grounds. The paper details how ISO’s XML-based temporal and spatial annotations, involving motions and spatio-temporally conditioned event-paths, will be converted to AMR or UMR forms. It also attempts to enrich the representation of dynamic information with the integrated spatio-temporal annotation scheme that accommodates first-order dynamic logic, as briefly noted. The main motivation of such an effort is to make spatio-temporal annotations and related dynamic information easily understandable by artificial agents like robots to act. Our approach bridges ISO’s richly specified standards with the task-oriented expressiveness of UMR and dynamic logic. This integration paves the way for seamless downstream use of spatio-temporal annotations in dialogue systems, simulation environments, and embodied agents.

This paper describes some of the ongoing work within the ISO preliminary work item PWI 254617-17, ‘Interlinking of annotations’. This PWI investigates the possibilities and problems of combining annotations made with different annotation schemes. using the ‘interlinking’ approach (Bunt, 2024) applied to different parts of the multi-part standard ISO 24617, ‘Semantic annotation framework’. This paper focuses on the combination of ISO-TimeML and QuantML at the level of abstract syntax. A new version is defined for the ISO-TimeML abstract syntax specification and how it relates to the concrete (XML-based) syntax as a basis for this combination. As a side-effect, some issues in the use of ISO-TimeML come to light that could be relevant for a possible future second edition of this standard.

pdf bib abs
The representation of QuantML annotations in UMR - an exploration
Harry Bunt | Kiyong Lee
Proceedings of the 21st Joint ACL - ISO Workshop on Interoperable Semantic Annotation (ISA-21)

This paper explores the possibilities and the problems in using Unified Meaning Representations (UMRs) for representing annotations of quantification phenomena, according to the ISO standard scheme QuantML (ISO 24617-12:2025). We show that the semantic information in QuantML annotations can we expressed in UMR, provided that some powerful semantic concepts are introduced and a slightly more general approach is adopted for the representation of multiple scope relations. Conversion functions are defined that transform the XML-based representations of QuantML into UMR structures and vice versa. The consequences are discussed that can be drawn from this regarding the possible role of UMR and the semantics of UMR representations of quantification.

As precursor work in preparation for an international standard ISO/PWI 24617-16 Language resource management – Semantic annotation – Part 16: Evaluative language, we aim to test and enhance the reliability of the annotation of subjective evaluation based on Appraisal Theory. We describe a comprehensive three-phase workflow tested on COVID-19 media reports to achieve reliable agreement through progressive training and quality control. Our methodology addresses some of the key challenges through the refinement of targeted guideline refinements and the development of interactive clarification tools, alongside a custom platform that enables the pre-classification of six evaluative categories, systematic annotation review, and organized documentation. We report empirical results that demonstrate substantial improvements from the initial moderate agreement to a strong final consensus. Our research offers both theoretical refinements addressing persistent classification challenges in evaluation and practical solutions for the implementation of the annotation workflow, proposing a replicable methodology for the achievement of reliable annotation consistency in the annotation of evaluative language.

2024

pdf bib
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024
Harry Bunt | Nancy Ide | Kiyong Lee | Volha Petukhova | James Pustejovsky | Laurent Romary
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

pdf bib abs
Combining semantic annotation schemes through interlinking
Harry Bunt
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

This paper explores the possibilities of using combinations of different semantic annotation schemes. This is particularly interesting for annotation schemes developed under the umbrella of the ISO Semantic Annotation Framework (ISO 24617), since these schemes were intended to be complementary, providing ways of indicating different semantic information about the same entities. However, there are certain overlaps between the schemes of SemAF parts, due to overlaps of their semantic domains, which are a potential source of inconsistencies. The paper shows how issues relating to inconsistencies can be addressed at the levels of concrete representation, abstract syntax, and semantic interpretation.

pdf bib abs
Fusing ISO 24617-2 Dialogue Acts and Application-Specific Semantic Content Annotations
Andrei Malchanau | Volha Petukhova | Harry Bunt
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

Accurately annotated data determines whether a modern high-performing AI/ML model will present a suitable solution to a complex task/application challenge, or time and resources are wasted. The more adequate the structure of the incoming data is specified, the more efficient the data is translated to be used by the application. This paper presents an approach to an application-specific dialogue semantics design which integrates the dialogue act annotation standard ISO 24617-2 and various domain-specific semantic annotations. The proposed multi-scheme design offers a plausible and a rather powerful strategy to integrate, validate, extend and reuse existing annotations, and automatically generate code for dialogue system modules. Advantages and possible trade-offs are discussed.

pdf bib abs
ISO 24617-12: A New Standard for Semantic Annotation
Harry Bunt
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents ISO 24617-12, an annotation scheme for quantification phenomena in natural language., as part of the ISO Semantic Annotation Framework (ISO 24617). This scheme combines ideas from the theory of generalised quantifiers, from neo-Davidsonian event semantics, and from Discourse Representation Theory. The scheme consists of (1) an abstract syntax which defines ‘annotation structures’ as triples and other set-theoretic constructs of quantification-related concepts; (2) a reference representation of annotation structures (‘concrete syntax’); and (3) a compositional semantics of annotation structures. Together, these components define the markup language QuantML. This paper focuses on the identification and structuring of the semantic information useful for the characterisation of quantification in natural language and the interoperable representation of these information structures in QuantML.

2023

pdf bib
Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA-19)
Harry Bunt
Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA-19)

pdf bib abs
The compositional semantics of QuantML annotations
Harry Bunt
Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA-19)

This paper discusses some issues in the semantic annotation of quantification phenomena in general, and in particular in the markup language QuantML, which has been proposed to form part of an ISO standard annotation scheme for quantification in natural language data. QuantML annotations have been claimed to have a compositional semantic interpretation, but the formal specification of QuantML in the official ISO documentation does not provide sufficient detail to judge this. This paper aims to fill this gap.

2022

pdf bib
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022
Harry Bunt
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

pdf bib abs
Intuitive and Formal Transparency in Annotation Schemes
Harry Bunt
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

This paper explores the application of the notion of ‘transparency’ to annotation schemes, understood as the properties that make it easy for potential users to see the scope of the scheme, the main concepts used in annotations, and the ways these concepts are interrelated. Based on an analysis of annotation schemes in the ISO Semantic Annotation Framework, it is argued that the way these schemes make use of ‘metamodels’ is not optimal, since these models are often not entirely clear and not directly related to the formal specification of the scheme. It is shown that by formalizing the relation between metamodels and annotations, by formalizing the relation between metamodels and annotations, both can benefit and can be made simpler, and the annotation scheme becomes intuitively more transparent.

This paper describes the continuation of a project that aims at establishing an interoperable annotation schema for quantification phenomena as part of the ISO suite of standards for semantic annotation, known as the Semantic Annotation Framework. After a break, caused by the Covid-19 pandemic, the project was relaunched in early 2022 with a second working draft of an annotation scheme, which is discussed in this paper. Keywords: semantic annotation, quantification, interoperability, annotation schema, ISO standard

2021

pdf bib
Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation
Harry Bunt
Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

pdf bib abs
The ISA-17 Quantification Challenge: Background and introduction
Harry Bunt
Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

This paper, intended for the ISA-17 Quantification Annotation track, provides background information for the shared quantification annotation task at the ISA-17 workshop, a.k.a. the Quantification Challenge. In particular, the role of the abstract and concrete syntax of the QuantML markup language are explained, and the semantic interpretation of QuantML annotations in relation to the ISO principles of semantic annotation. Additionally, the choice is motivated of the test suite of the Quantification Challenge, along with the suggested markables for the sentences of the suite.

2020

pdf bib
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation
Harry Bunt
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

pdf bib abs
Annotation of Quantification: The Current State of ISO 24617-12
Harry Bunt
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

This paper discusses the current state of developing an ISO standard annotation scheme for quantification phenomena in natural language, as part of the ISO Semantic Annotation Framework (ISO 24617). A proposed approach that combines ideas from the theory of generalised quantifiers and from neo-Davidsonian event semantics was adopted by the ISO organisation in 2019 as a starting point for developing such an annotation scheme. * This scheme consists of (1) a conceptual ‘metamodel’ that visualises the types of entities, functions and relations that go into annotations of quantification; (2) an abstract syntax which defines ‘annotation structures’ as triples and other set-theoretic constructs; (3) an XML-based representation of annotation structures (‘concrete syntax’); and (4) a compositional semantics of annotation structures. The latter three components together define the interpreted markup language QuantML. The focus in this paper is on the structuring of the semantic information needed to characterise quantification in natural language and the representation of these structures in QuantML.

pdf bib abs
Adapting the ISO 24617-2 Dialogue Act Annotation Scheme for Modelling Medical Consultations
Volha Petukhova | Harry Bunt
Proceedings of the 16th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

Effective, professional and socially competent dialogue of health care providers with their patients is essential to best practice in medicine. To identify, categorize and quantify salient features of patient-provider communication, to model interactive processes in medical encounters and to design digital interactive medical services, two important instruments have been developed: (1) medical interaction analysis systems with the Roter Interaction Analysis System (RIAS) as the most widely used by medical practitioners and (2) dialogue act annotation schemes with ISO 24617-2 as a multidimensional taxonomy of interoperable semantic concepts widely used for corpus annotation and dialogue systems design. Neither instrument fits all purposes. In this paper, we perform a systematic comparative analysis of the categories defined in the RIAS and ISO taxonomies. Overcoming the deficiencies and gaps that were found, we propose a number of extensions to the ISO annotation scheme, making it a powerful analytical and modelling instrument for the analysis, modelling and assessment of medical communication.

ISO standard 24617-2 for dialogue act annotation, established in 2012, has in the past few years been used both in corpus annotation and in the design of components for spoken and multimodal dialogue systems. This has brought some inaccuracies and undesirbale limitations of the standard to light, which are addressed in a proposed second edition. This second edition allows a more accurate annotation of dependence relations and rhetorical relations in dialogue. Following the ISO 24617-4 principles of semantic annotation, and borrowing ideas from EmotionML, a triple-layered plug-in mechanism is introduced which allows dialogue act descriptions to be enriched with information about their semantic content, about accompanying emotions, and other information, and allows the annotation scheme to be customised by adding application-specific dialogue act types.

2019

pdf bib abs
A Semantic Annotation Scheme for Quantification
Harry Bunt
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

This paper describes in brief the proposal called ‘QuantML’ which was accepted by the International Organisation for Standards (ISO) last February as a starting point for developing a standard for the interoperable annotation of quantification phenomena in natural language, as part of the ISO 24617 Semantic Annotation Framework. The proposal, firmly rooted in the theory of generalised quantifiers, neo-Davidsonian semantics, and DRT, covers a wide range of quantification phenomena. The QuantML scheme consists of (1) an abstract syntax which defines ‘annotation structures’ as triples and other set-theoretic constructs; (b) a compositional semantics of annotation structures; (3) an XML representation of annotation structures.

This paper presents the DialogBank, a new language resource consisting of dialogues with gold standard annotations according to the ISO 24617-2 standard. Some of these dialogues have been taken from existing corpora and have been re-annotated according to the ISO standard; others have been annotated directly according to the standard. The ISO 24617-2 annotations have been designed according to the ISO principles for semantic annotation, as formulated in ISO 24617-6. The DialogBank makes use of three alternative representation formats, which are shown to be interoperable.

2015

pdf bib
On the Principles of Semantic Annotation
Harry Bunt
Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11)

pdf bib
Semantic Relations in Discourse: The Current State of ISO 24617-8
Rashmi Prasad | Harry Bunt
Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11)

2014

pdf bib abs
Interoperability of Dialogue Corpora through ISO 24617-2-based Querying
Volha Petukhova | Andrei Malchanau | Harry Bunt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper explores a way of achieving interoperability: developing a query format for accessing existing annotated corpora whose expressions make use of the annotation language defined by the standard. The interpretation of expressions in the query implements a mapping from ISO 24617-2 concepts to those of the annotation scheme used in the corpus. We discuss two possible ways to query existing annotated corpora using DiAML. One way is to transform corpora into DiAML compliant format, and subsequently query these data using XQuery or XPath. The second approach is to define a DiAML query that can be directly used to retrieve requested information from the annotated data. Both approaches are valid. The first one presents a standard way of querying XML data. The second approach is a DiAML-oriented querying of dialogue act annotated data, for which we designed an interface. The proposed approach is tested on two important types of existing dialogue corpora: spoken two-person dialogue corpora collected and annotated within the HCRC Map Task paradigm, and multiparty face-to-face dialogues of the AMI corpus. We present the results and evaluate them with respect to accuracy and completeness through statistical comparisons between retrieved and manually constructed reference annotations.

2013

pdf bib
The semantic annotation of quantification
Harry Bunt
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Short Papers

pdf bib
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation
Harry Bunt
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation

pdf bib
Issues in the addition of ISO standard annotations to the Switchboard corpus
Harry Bunt | Alex C. Fang | Xiaoyue Liu | Jing Cao | Volha Petukhova
Proceedings of the 9th Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation

pdf bib
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)
Harry Bunt | Khalil Sima'an | Liang Huang
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

2012

pdf bib abs
The coding and annotation of multimodal dialogue acts
Volha Petukhova | Harry Bunt
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Recent years have witnessed a growing interest in annotating linguistic data at the semantic level, including the annotation of dialogue corpus data. The annotation scheme developed as international standard for dialogue act annotation ISO 24617-2 is based on the DIT++ scheme (Bunt, 2006; 2009) which combines the multidimensional DIT scheme (Bunt, 1994) with concepts from DAMSL (Allen and Core , 1997) and various other schemes. This scheme is designed in a such way that it can be applied not only to spoken dialogue, as is the case for most of the previously defined dialogue annotation schemes, but also to multimodal dialogue. This paper describes how the ISO 24617-2 annotation scheme can be used, together with the DIT++ method of multidimensional segmentation', to annotate nonverbal and multimodal dialogue behaviour. We analyse the fundamental distinction between (a) the coding of surface features; (b) form-related semantic classification; and (c) semantic annotation in terms of dialogue acts, supported by experimental studies of (a) and (b). We discuss examples of specification languages for representing the results of each of these activities, show how dialogue act annotations can be attached to XML representations of functional segments of multimodal data.

This paper summarizes the latest, final version of ISO standard 24617-2 ``Semantic annotation framework, Part 2: Dialogue acts"""". Compared to the preliminary version ISO DIS 24617-2:2010, described in Bunt et al. (2010), the final version additionally includes concepts for annotating rhetorical relations between dialogue units, defines a full-blown compositional semantics for the Dialogue Act Markup Language DiAML (resulting, as a side-effect, in a different treatment of functional dependence relations among dialogue acts and feedback dependence relations); and specifies an optimally transparent XML-based reference format for the representation of DiAML annotations, based on the systematic application of the notion of `ideal concrete syntax'. We describe these differences and briefly discuss the design and implementation of an incremental method for dialogue act recognition, which proves the usability of the ISO standard for automatic dialogue annotation.

pdf bib abs
Using DiAML and ANVIL for multimodal dialogue annotations
Harry Bunt | Michael Kipp | Volha Petukhova
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper shows how interoperable dialogue act annotations, using the multidimensional annotation scheme and the markup language DiAML of ISO standard 24617-2, can conveniently be obtained using the newly implemented facility in the ANVIL annotation tool to produce XML-based output directly in the DiAML format. ANVIL offers the use of multiple user-defined `tiers' for annotating various kinds of information. This is shown to be convenient not only for multimodal information but also for dialogue act annotation according to ISO standard 24617-2 because of the latter's multidimensionality: functional dialogue segments are viewed as expressing one or more dialogue acts, and every dialogue act belongs to one of a number of dimensions of communication, defined in the standard, for each of which a different ANVIL tier can conveniently be used. Annotations made in the multi-tier interface can be exported in the ISO 24617-2 format, thus supporting the creation of interoperable annotated corpora of multimodal dialogue.

pdf bib
Collaborative Annotation of Dialogue Acts: Application of a New ISO Standard to the Switchboard Corpus
Alex C. Fang | Harry Bunt | Jing Cao | Xiaoyue Liu
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

2011

pdf bib
The Semantics of Dialogue Acts
Harry Bunt
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
Incremental dialogue act understanding
Volha Petukhova | Harry Bunt
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
Proceedings of the 12th International Conference on Parsing Technologies
Harry Bunt | Joakim Nivre | Özlem Çetinoglu
Proceedings of the 12th International Conference on Parsing Technologies

2010

pdf bib abs
ISO-TimeML: An International Standard for Semantic Annotation
James Pustejovsky | Kiyong Lee | Harry Bunt | Laurent Romary
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present ISO-TimeML, a revised and interoperable version of the temporal markup language, TimeML. We describe the changes and enrichments made, while framing the effort in a more general methodology of semantic annotation. In particular, we assume a principled distinction between the annotation of an expression and the representation which that annotation denotes. This involves not only the specification of an annotation language for a particular phenomenon, but also the development of a meta-model that allows one to interpret the syntactic expressions of the specification semantically.

pdf bib abs
Towards an Integrated Scheme for Semantic Annotation of Multimodal Dialogue Data
Volha Petukhova | Harry Bunt
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Recent years witness a growing interest in the use of multimodal data for modelling of communicative behaviour in dialogue. Dybkjaer and Bernsen (2002), point out that coding schemes for multimodal data are used solely by their creators. Standardisation has been achieved to some extent for coding behavioural features for certain nonverbal expressions, e.g. for facial expression, however, for the semantic annotation of such expressions combined with other modalities such as speech there is still a long way to go. The majority of existing dialogue act annotation schemes that are designed to code semantic and pragmatic dialogue information are limited to analysis of spoken modality. This paper investigates the applicability of existing dialogue act annotation schemes to the semantic annotation of multimodal data, and the way a dialogue act annotation scheme can be extended to cover dialogue phenomena from multiple modalities. The general conclusion of our explorative study is that a multidimensional dialogue act taxonomy is usable for this purpose when some adjustments are made. We proposed a solution for adding these aspects to a dialogue act annotation scheme without changing its set of communicative functions, in the form of qualifiers that can be attached to communicative function tags.

This paper describes an ISO project which aims at developing a standard for annotating spoken and multimodal dialogue with semantic information concerning the communicative functions of utterances, the kind of semantic content they address, and their relations with what was said and done earlier in the dialogue. The project, ISO 24617-2 ""Semantic annotation framework, Part 2: Dialogue acts"", is currently at DIS stage. The proposed annotation schema distinguishes 9 orthogonal dimensions, allowing each functional segment in dialogue to have a function in each of these dimensions, thus accounting for the multifunctionality that utterances in dialogue often have. A number of core communicative functions is defined in the form of ISO data categories, available at http://semantic-annotation.uvt.nl/dialogue-acts/iso-datcats.pdf; they are divided into ""dimension-specific"" functions, which can be used only in a particular dimension, such as Turn Accept in the Turn Management dimension, and ""general-purpose"" functions, which can be used in any dimension, such as Inform and Request. An XML-based annotation language, ""DiAML"" is defined, with an abstract syntax, a semantics, and a concrete syntax.

pdf bib
Anatomy of Annotation Schemes: Mapping to GrAF
Nancy Ide | Harry Bunt
Proceedings of the Fourth Linguistic Annotation Workshop

2009

pdf bib
The independence of dimensions in multidimensional dialogue act annotation
Volha Petukhova | Harry Bunt
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Proceedings of the Eight International Conference on Computational Semantics
Harry Bunt
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Semantic annotations as complimentary to underspecified semantic representations
Harry Bunt
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Towards a Multidimensional Semantics of Discourse Markers in Spoken Dialogue
Volha Petukhova | Harry Bunt
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
A methodological note on the definition of semantic annotation languages (short paper)
Harry Bunt | Chwhynny Overbeeke
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)
Harry Bunt | Éric Villemonte de la Clergerie
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2008

pdf bib abs
Evaluating Dialogue Act Tagging with Naive and Expert Annotators
Jeroen Geertzen | Volha Petukhova | Harry Bunt
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper the dialogue act annotation of naive and expert annotators, both annotating the same data, are compared in order to characterise the insights annotations made by different kind of annotators may provide for evaluating dialogue act tagsets. It is argued that the agreement among naive annotators provides insight in the clarity of the tagset, whereas agreement among expert annotators provides an indication of how reliably the tagset can be applied when errors are ruled out that are due to deficiencies in understanding the concepts of the tagset, to a lack of experience in using the annotation tool, or to little experience in annotation more generally. An indication of the differences between the two groups in terms of inter-annotator agreement and tagging accuracy on task-oriented dialogue in different domains, annotated with the DIT++ dialogue act tagset is presented, and the annotations of both groups are assessed against a gold standard. Additionally, the effect of the reduction of the tagsets granularity on the performances of both groups is looked into. In general, it is concluded that the annotations of both groups provide complementary insights in reliability, clarity, and more fundamental conceptual issues.

pdf bib abs
Towards Formal Interpretation of Semantic Annotation
Harry Bunt | Chwhynny Overbeeke
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present a novel approach to the incremental incorporation of semantic information in natural language processing which does not fall victim to the notorious problems of ambiguity and lack of robustness, namely through the formal interpretation of semantic annotation. We present a formal semantics for a language for the integrated annotation of several types of semantic information, such as (co-)reference relations, temporal information, and semantic roles. This semantics has the form of a compositional translation into second-order logic. We show that a truly semantic approach to the annotation of different types of semantic information raises interesting issues relating to the borders between these areas of semantics, and to the consistency of semantic annotations in multiple areas or in multiple annotation layers. The approach is compositional, in the sense that every well-formed subexpression of the annotation language can be translated to formal logic (and hence interpreted) independent of the rest of the annotation structure. The approach is also incremental in the sense that it is designed to be extendable to the semantic annotation of many other types of semantic information, such as spatial information, noun-noun relations, or quantification and modification structures.

pdf bib abs
LIRICS Semantic Role Annotation: Design and Evaluation of a Set of Data Categories
Volha Petukhova | Harry Bunt
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Semantic roles have often proved to be useful labels for stating linguistic generalisations of various sorts. There is, however, a lack of agreement on their defining criteria, which causes serious problems for semantic roles to be a useful classificatory device for predicate-argument relations. These criteria should (a) support the design of a semantic role set which is complete but does not contain redundant relations; (b) be based on semantic rather than morphological, lexical or syntactic properties; and (c) enable formal interpretation. In this paper we report on the analyses of alternative approaches to annotation and representation of semantic role information (such as FrameNet, PropBank and VerbNet) with respect to their models of description, granularity of semantic role sets, definitions of semantic roles concepts, consistency and reliability of annotations. We present methodological principles for characterising well-defined concepts which were developed within the LIRICS (Linguistic InfRastructure for Interoperable ResourCes and Systems; see http://lirics.loria.fr) project, as well as the designed set of semantic roles and their definitions in ISO 12620 format. We discuss evaluation results of the defined concepts for semantic role annotation concerning the redundancy and completeness of the tagset and the reliability of annotations in terms of inter-annotator agreement.

pdf bib
A New Life for Semantic Annotations?
Harry Bunt
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

pdf bib
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue
Harry Bunt | Simon Keizer | Tim Paek
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
A Multidimensional Approach to Utterance Segmentation and Dialogue Act Classification
Jeroen Geertzen | Volha Petukhova | Harry Bunt
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
Evaluating Combinations of Dialogue Acts for Generation
Simon Keizer | Harry Bunt
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
An Empirically Based Computational Model of Grounding in Dialogue
Harry Bunt | Roser Morante | Simon Keizer
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
Proceedings of the Tenth International Conference on Parsing Technologies
Harry Bunt | Paola Merlo
Proceedings of the Tenth International Conference on Parsing Technologies

pdf bib
The Semantics of Semantic Annotation
Harry Bunt
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

2006

pdf bib abs
Dimensions in Dialogue Act Annotation
Harry Bunt
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper is concerned with the fundamentals of multidimensional dialogue act annotation, i.e. with what it means to annotate dialogues with information about the communicative acts that are performed with the utterances, taking various 'dimensions' into account. Two ideas seem to be prevalent in the literature concerning the notion of dimension: (1) dimensions correspond to different types of information; and (2) a dimension is formed by a set of mutually exclusive tags. In DAMSL, for instance, the terms dimension and layer are used sometimes in the sense of (1) and sometimes in that of (2). We argue that being mutually exclusive is not a good criterion for a set of dialogue act types to constitute a dimension, even though the description of an object in a multidimensional space should never assign more than one value per dimension. We define a dimension of dialogue act annotation as an aspect of participating in a dialogue that can be addressed independently by means of dialogue acts. We show that DAMSL dimensions such as Info-request, Statement, and Answer do not qualify as proper dimensions, and that the communicative functions in these categories do not fall in any specific dimension, but should be considered as general-purpose in the sense that they can be used in any dimension. We argue that using the notion of dimension that we propose, a multidimensional taxonomy of dialogue acts emerges that optimally supports multidimensional dialogue act annotation.

pdf bib abs
Methodological Aspects of Semantic Annotation
Harry Bunt | Amanda Schiffrin
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper constitutes a preliminary report on the work carried out on semantic content annotation in the LIRICS project, in close collaboration with the activities of ISO TC 37/SC 4/TDG 31. This consists primarily of: (1) identifying commonalities in alternative approaches to the annotation and representation of various types of semantic information; and (2) developing methodological principles and concepts for identifying and characterising representational concepts for semantic content. The LIRICS project does not aim to develop a standard format for the annotation and representation of semantic content, but at providing well-defined descriptive concepts. In particular, the aim is to build an on-line registry of definitions of such concepts, called data categories, in accordance with ISO standard 12620. These semantic data categories are abstract concepts, whose use is not restricted to any particular format or representation language. We advocate the use of the metamodel as a tool to extract the most important of these abstract overarching concepts, with examples from dialogue act, temporal, reference and semantic role annotation.

pdf bib
Multidimensional Dialogue Management
Simon Keizer | Harry Bunt
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

pdf bib
Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme
Jeroen Geertzen | Harry Bunt
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

2005

pdf bib
Proceedings of the Ninth International Workshop on Parsing Technology
Harry Bunt | Robert Malouf
Proceedings of the Ninth International Workshop on Parsing Technology

2004

pdf bib
Standardization in Multimodal Content Representation: Some Methodological Issues
Harry Bunt | Laurent Romary
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

1993

pdf bib
Proceedings of the Third International Workshop on Parsing Technologies (IWPT ’93)
Harry Bunt
Proceedings of the Third International Workshop on Parsing Technologies

pdf bib abs
Parsing as Dynamic Interpretation
Harry Bunt | Ko van der Sloat
Proceedings of the Third International Workshop on Parsing Technologies

In this paper we consider the merging of the language of feature structures with a formal logical language, and how the semantic definition of the resulting language can be used in parsing. For the logical language we use the language EL, defined and implemented earlier for computational semantic purposes. To this language we add the basic constructions and operations of feature structures. The extended language we refer to as ‘Generalized EL’, or ‘GEL’. The semantics of EL, and that of its extension GEL, is defined model-theoretically: for each construction of the language, a recursive rule describes how its value can be computed from the values of its constituents. Since GEL talks not only about semantic objects and their relations but also about syntactic concepts, GEL models are nonstandard in containing both kinds of entities. Whereas phrase-structure rules are traditionally viewed procedurally, as recipes for building phrases, and a rule in the parsing-as-deduction is viewed declaratively, as a proposition which is true when the conditions for building the phrase are satisfied, a rule in GEL is best viewed as a proposition in Dynamic Semantics: it can be evaluated recursively, and evaluates not to true or false, but to the minimal change in the model, needed to make the proposition true. The viability of this idea has been demonstrated by a proof-of-concept implementation for DPSG chart parsing and an emulation of HPSG parsing in the STUF environment.