T2K^2: a System for Automatically Extracting and Organizing Knowledge from Texts
Felice Dell’Orletta, Giulia Venturi, Andrea Cimino, Simonetta Montemagni
Abstract
In this paper, we present T2K^2, a suite of tools for automatically extracting domain―specific knowledge from collections of Italian and English texts. T2K^2 (Text―To―Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain―specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K^2 also includes linguistic profiling functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the added value of newly inserted documents. T2K^2 is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.- Anthology ID:
- L14-1477
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 2062–2070
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/590_Paper.pdf
- DOI:
- Cite (ACL):
- Felice Dell’Orletta, Giulia Venturi, Andrea Cimino, and Simonetta Montemagni. 2014. T2K^2: a System for Automatically Extracting and Organizing Knowledge from Texts. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2062–2070, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- T2K^2: a System for Automatically Extracting and Organizing Knowledge from Texts (Dell’Orletta et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/590_Paper.pdf