Knowledge-Rich Context Extraction and Ranking with KnowPipe

Anne-Kathrin Schumann


Abstract
This paper presents ongoing Phd thesis work dealing with the extraction of knowledge-rich contexts from text corpora for terminographic purposes. Although notable progress in the field has been made over recent years, there is yet no methodology or integrated workflow that is able to deal with multiple, typologically different languages and different domains, and that can be handled by non-expert users. Moreover, while a lot of work has been carried out to research the KRC extraction step, the selection and further analysis of results still involves considerable manual work. In this view, the aim of this paper is two-fold. Firstly, the paper presents a ranking algorithm geared at supporting the selection of high-quality contexts once the extraction has been finished and describes ranking experiments with Russian context candidates. Secondly, it presents the KnowPipe framework for context extraction: KnowPipe aims at providing a processing environment that allows users to extract knowledge-rich contexts from text corpora in different languages using shallow and deep processing techniques. In its current state of development, KnowPipe provides facilities for preprocessing Russian and German text corpora, for pattern-based knowledge-rich context extraction from these corpora using shallow analysis as well as tools for ranking Russian context candidates.
Anthology ID:
L12-1394
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3626–3630
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/678_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Anne-Kathrin Schumann. 2012. Knowledge-Rich Context Extraction and Ranking with KnowPipe. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3626–3630, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Knowledge-Rich Context Extraction and Ranking with KnowPipe (Schumann, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/678_Paper.pdf