Seshat: a Tool for Managing and Verifying Annotation Campaigns of Audio Data

Hadrien Titeux, Rachid Riad, Xuan-Nga Cao, Nicolas Hamilakis, Kris Madden, Alejandrina Cristia, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux


Abstract
We introduce Seshat, a new, simple and open-source software to efficiently manage annotations of speech corpora. The Seshat software allows users to easily customise and manage annotations of large audio corpora while ensuring compliance with the formatting and naming conventions of the annotated output files. In addition, it includes procedures for checking the content of annotations following specific rules that can be implemented in personalised parsers. Finally, we propose a double-annotation mode, for which Seshat computes automatically an associated inter-annotator agreement with the gamma measure taking into account the categorisation and segmentation discrepancies.
Anthology ID:
2020.lrec-1.861
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6976–6982
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.861
DOI:
Bibkey:
Cite (ACL):
Hadrien Titeux, Rachid Riad, Xuan-Nga Cao, Nicolas Hamilakis, Kris Madden, Alejandrina Cristia, Anne-Catherine Bachoud-Lévi, and Emmanuel Dupoux. 2020. Seshat: a Tool for Managing and Verifying Annotation Campaigns of Audio Data. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6976–6982, Marseille, France. European Language Resources Association.
Cite (Informal):
Seshat: a Tool for Managing and Verifying Annotation Campaigns of Audio Data (Titeux et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2020.lrec-1.861.pdf
Code
 bootphon/seshat