Feature-based Encoding and Querying Language Resources with Character Semantics

Baden Hughes, Dafydd Gibbon, Thorsten Trippel


Abstract
In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these language resources requires more than a best practice data format. In particular, where language resources are created in linguistic fieldwork, and especially for minority languages, the need for preservation not only of the resource itself, but of additional metadata which allows for the resource to be accurately interpreted in the future is becoming a topic of research in itself. In this paper we extend earlier work on semantically based character decomposition to include representation of character properties in a variety of models, and a mechanism for exploiting these properties through queries.
Anthology ID:
L06-1296
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/493_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Baden Hughes, Dafydd Gibbon, and Thorsten Trippel. 2006. Feature-based Encoding and Querying Language Resources with Character Semantics. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Feature-based Encoding and Querying Language Resources with Character Semantics (Hughes et al., LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/493_pdf.pdf