Applying Random Indexing to Structured Data to Find Contextually Similar Words

Danica Damljanović, Udo Kruschwitz, M-Dyaa Albakour, Johann Petrak, Mihai Lupu


Abstract
Language resources extracted from structured data (e.g. Linked Open Data) have already been used in various scenarios to improve conventional Natural Language Processing techniques. The meanings of words and the relations between them are made more explicit in RDF graphs, in comparison to human-readable text, and hence have a great potential to improve legacy applications. In this paper, we describe an approach that can be used to extend or clarify the semantic meaning of a word by constructing a list of contextually related terms. Our approach is based on exploiting the structure inherent in an RDF graph and then applying the methods from statistical semantics, and in particular, Random Indexing, in order to discover contextually related terms. We evaluate our approach in the domain of life science using the dataset generated with the help of domain experts from a large pharmaceutical company (AstraZeneca). They were involved in two phases: firstly, to generate a set of keywords of interest to them, and secondly to judge the set of generated contextually similar words for each keyword of interest. We compare our proposed approach, exploiting the semantic graph, with the same method applied on the human readable text extracted from the graph.
Anthology ID:
L12-1361
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2023–2030
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/628_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Danica Damljanović, Udo Kruschwitz, M-Dyaa Albakour, Johann Petrak, and Mihai Lupu. 2012. Applying Random Indexing to Structured Data to Find Contextually Similar Words. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2023–2030, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Applying Random Indexing to Structured Data to Find Contextually Similar Words (Damljanović et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/628_Paper.pdf