2012
pdf
bib
A high speed transcription interface for annotating primary linguistic data
Mark Dingemanse
|
Jeremy Hammond
|
Herman Stehouwer
|
Aarthy Somasundaram
|
Sebastian Drude
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
pdf
abs
Semantic metadata mapping in practice: the Virtual Language Observatory
Dieter Van Uytvanck
|
Herman Stehouwer
|
Lari Lampen
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In this paper we present the Virtual Language Observatory (VLO), a metadata-based portal for language resources. It is completely based on the Component Metadata (CMDI) and ISOcat standards. This approach allows for the use of heterogeneous metadata schemas while maintaining the semantic compatibility. We describe the metadata harvesting process, based on OAI-PMH, and the conversion from several formats (OLAC, IMDI and the CLARIN LRT inventory) to their CMDI counterpart profiles. Then we focus on some post-processing steps to polish the harvested records. Next, the ingestion of the CMDI files into the VLO facet browser is described. We also include an overview of the changes since the first version of the VLO, based on user feedback from the CLARIN community. Finally there is an overview of additional ideas and improvements for future versions of the VLO.
pdf
abs
Federated Search: Towards a Common Search Infrastructure
Herman Stehouwer
|
Matej Durco
|
Eric Auer
|
Daan Broeder
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Within scientific institutes there exist many language resources. These resources are often quite specialized and relatively unknown. The current infrastructural initiatives try to tackle this issue by collecting metadata about the resources and establishing centers with stable repositories to ensure the availability of the resources. It would be beneficial if the researcher could, by means of a simple query, determine which resources and which centers contain information useful to his or her research, or even work on a set of distributed resources as a virtual corpus. In this article we propose an architecture for a distributed search environment allowing researchers to perform searches in a set of distributed language resources.
2011
pdf
Unlocking Language Archives Using Search
Herman Stehouwer
|
Eric Auer
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage
2009
pdf
Language Models for Contextual Error Detection and Correction
Herman Stehouwer
|
Menno van Zaanen
Proceedings of the EACL 2009 Workshop on Computational Linguistic Aspects of Grammatical Inference