GermaNet is regarded to be a valuable resource for many German NLP applications, corpus research, and teaching. This demo presents three GUI-based tools meant to facilitate the exploration of and navigation through GermaNet. The GermaNet Explorer exhibits various retrieval, sort, filter and visualization functions for words/synsets and also provides an insight into the modeling of GermaNets semantic relations as well as its representation as a graph. The GermaNet-Measure-API and GermaNet Pathfinder offer methods for the calculation of semantic relatedness based on GermaNet as a resource and the visualization of (semantic) paths between words/synsets. The GermaNet-Measure-API furthermore features a flexible interface, which facilitates the integration of all relatedness measures provided into user-defined applications. We have already used the three tools in our research on thematic chaining and thematic indexing, as a tool for the manual annotation of lexical chains, and as a resource in our courses on corpus linguistics and semantics.
Building an Evaluation Corpus for German Question Answering by Harvesting Wikipedia
Irene Cramer | Jochen L. Leidner | Dietrich Klakow
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The growing interest in open-domain question answering is limited by the lack of evaluation and training resources. To overcome this resource bottleneck for German, we propose a novel methodology to acquire new question-answer pairs for system evaluation that relies on volunteer collaboration over the Internet. Utilizing Wikipedia, a popular free online encyclopedia available in several languages, we show that the data acquisition problem can be cast as a Web experiment. We present a Web-based annotation tool and carry out a distributed data collection experiment. The data gathered from the mostly anonymous contributors is compared to a similar dataset produced in-house by domain experts on the one hand, and the German questions from the from the CLEF QA 2004 effort on the other hand. Our analysis of the datasets suggests that using our novel method a medium-scale evaluation resource can be built at very small cost in a short period of time. The technique and software developed here is readily applicable to other languages where free online encyclopedias are available, and our resulting corpus is likewise freely available.