Abstract
There is an incredible amount of information available in the form of textual documents due to the growth of information sources. In order to get the information into an actionable way, it is common to use information extraction and more specifically the event extraction, it became crucial in various domains even in public health. In this paper, we address the problem of the epidemic event extraction in potentially any language, so that we tested different corpuses on an existed multilingual system for tele-epidemiology: the Data Analysis for Information Extraction in any Language(DANIEL) system. We focused on the influence of the number of documents on the performance of the system, on average results show that it is able to achieve a precision and recall around 82%, but when we resorted to the evaluation by event by checking whether it has been really detected or not, the results are not satisfactory according to this paper’s evaluation. Our idea is to propose a system that uses an ontology which includes information in different languages and covers specific epidemiological concepts, it is also based on the multilingual open information extraction for the relation extraction step to reduce the expert intervention and to restrict the content for each text. We describe a methodology of five main stages: Pre-processing, relation extraction, named entity recognition (NER), event recognition and the matching between the information extracted and the ontology.