Anaïs Halftermeyer


2020

pdf bib
ODIL_Syntax: a Free Spontaneous Spoken French Treebank Annotated with Constituent Trees
Ilaine Wang | Aurore Pelletier | Jean-Yves Antoine | Anaïs Halftermeyer
Proceedings of the 12th Language Resources and Evaluation Conference

This paper describes ODIL Syntax, a French treebank built on spontaneous speech transcripts. The syntactic structure of every speech turn is represented by constituent trees, through a procedure which combines an automatic annotation provided by a parser (here, the Stanford Parser) and a manual revision. ODIL Syntax respects the annotation scheme designed for the French TreeBank (FTB), with the addition of some annotation guidelines that aims at representing specific features of the spoken language such as speech disfluencies. The corpus will be freely distributed by January 2020 under a Creative Commons licence. It will ground a further semantic enrichment dedicated to the representation of temporal entities and temporal relations, as a second phase of the ODIL@Temporal project. The paper details the annotation scheme we followed with a emphasis on the representation of speech disfluencies. We then present the annotation procedure that was carried out on the Contemplata annotation platform. In the last section, we provide some distributional characteristics of the annotated corpus (POS distribution, multiword expressions).

pdf bib
Contemplata, a Free Platform for Constituency Treebank Annotation
Jakub Waszczuk | Ilaine Wang | Jean-Yves Antoine | Anaïs Halftermeyer
Proceedings of the 12th Language Resources and Evaluation Conference

This paper describes Contemplata, an annotation platform that offers a generic solution for treebank building as well as treebank enrichment with relations between syntactic nodes. Contemplata is dedicated to the annotation of constituency trees. The framework includes support for syntactic parsers, which provide automatic annotations to be manually revised. The balanced strategy of annotation between automatic parsing and manual revision allows to reduce the annotator workload, which favours data reliability. The paper presents the software architecture of Contemplata, describes its practical use and eventually gives two examples of annotation projects that were conducted on the platform.