A generic formalism to represent linguistic corpora in RDF and OWL/DL

Christian Chiarcos


Abstract
This paper describes POWLA, a generic formalism to represent linguistic corpora by means of RDF and OWL/DL. Unlike earlier approaches in this direction, POWLA is not tied to a specific selection of annotation layers, but rather, it is designed to support any kind of text-oriented annotation. POWLA inherits its generic character from the underlying data model PAULA (Dipper, 2005; Chiarcos et al., 2009) that is based on early sketches of the ISO TC37/SC4 Linguistic Annotation Framework (Ide and Romary, 2004). As opposed to existing standoff XML linearizations for such generic data models, it uses RDF as representation formalism and OWL/DL for validation. The paper discusses advantages of this approach, in particular with respect to interoperability and queriability, which are illustrated for the MASC corpus, an open multi-layer corpus of American English (Ide et al., 2008).
Anthology ID:
L12-1548
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3205–3212
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/915_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Christian Chiarcos. 2012. A generic formalism to represent linguistic corpora in RDF and OWL/DL. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3205–3212, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
A generic formalism to represent linguistic corpora in RDF and OWL/DL (Chiarcos, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/915_Paper.pdf