Haïfa Zargayouna

Also published as: Haifa Zargayouna


Apport des dépendances syntaxiques et des patrons séquentiels à l’extraction de relations ()
Kata Gábor | Nadège Lechevrel | Isabelle Tellier | Davide Buscaldi | Haifa Zargayouna | Thierry Charnois
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

SemEval-2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers
Kata Gábor | Davide Buscaldi | Anne-Kathrin Schumann | Behrang QasemiZadeh | Haïfa Zargayouna | Thierry Charnois
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes the first task on semantic relation extraction and classification in scientific paper abstracts at SemEval 2018. The challenge focuses on domain-specific semantic relations and includes three different subtasks. The subtasks were designed so as to compare and quantify the effect of different pre-processing steps on the relation classification results. We expect the task to be relevant for a broad range of researchers working on extracting specialized knowledge from domain corpora, for example but not limited to scientific or bio-medical information extraction. The task attracted a total of 32 participants, with 158 submissions across different scenarios.


Exploring Vector Spaces for Semantic Relations
Kata Gábor | Haïfa Zargayouna | Isabelle Tellier | Davide Buscaldi | Thierry Charnois
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Word embeddings are used with success for a variety of tasks involving lexical semantic similarities between individual words. Using unsupervised methods and just cosine similarity, encouraging results were obtained for analogical similarities. In this paper, we explore the potential of pre-trained word embeddings to identify generic types of semantic relations in an unsupervised experiment. We propose a new relational similarity measure based on the combination of word2vec’s CBOW input and output vectors which outperforms concurrent vector representations, when used for unsupervised clustering on SemEval 2010 Relation Classification data.


Détection et classification non supervisées de relations sémantiques dans des articles scientifiques (Unsupervised Classification of Semantic Relations in Scientific Papers)
Kata Gábor | Isabelle Tellier | Thierry Charnois | Haïfa Zargayouna | Davide Buscaldi
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)

Dans cet article, nous abordons une tâche encore peu explorée, consistant à extraire automatiquement l’état de l’art d’un domaine scientifique à partir de l’analyse d’articles de ce domaine. Nous la ramenons à deux sous-tâches élémentaires : l’identification de concepts et la reconnaissance de relations entre ces concepts. Une extraction terminologique permet d’identifier les concepts candidats, qui sont ensuite alignés à des ressources externes. Dans un deuxième temps, nous cherchons à reconnaître et classifier automatiquement les relations sémantiques entre concepts de manière nonsupervisée, en nous appuyant sur différentes techniques de clustering et de biclustering. Nous mettons en œuvre ces deux étapes dans un corpus extrait de l’archive de l’ACL Anthology. Une analyse manuelle nous a permis de proposer une typologie des relations sémantiques, et de classifier un échantillon d’instances de relations. Les premières évaluations suggèrent l’intérêt du biclustering pour détecter de nouveaux types de relations dans le corpus.

Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific Literature
Kata Gábor | Haïfa Zargayouna | Davide Buscaldi | Isabelle Tellier | Thierry Charnois
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the process of creating a corpus annotated for concepts and semantic relations in the scientific domain. A part of the ACL Anthology Corpus was selected for annotation, but the annotation process itself is not specific to the computational linguistics domain and could be applied to any scientific corpora. Concepts were identified and annotated fully automatically, based on a combination of terminology extraction and available ontological resources. A typology of semantic relations between concepts is also proposed. This typology, consisting of 18 domain-specific and 3 generic relations, is the result of a corpus-based investigation of the text sequences occurring between concepts in sentences. A sample of 500 abstracts from the corpus is currently being manually annotated with these semantic relations. Only explicit relations are taken into account, so that the data could serve to train or evaluate pattern-based semantic relation classification systems.


Help enrich a terminological repository : proposals and experiments (Aide à l’enrichissement d’un référentiel terminologique : propositions et expérimentations) [in French]
Thibault Mondary | Adeline Nazarenko | Haïfa Zargayouna | Sabine Barreaux
Proceedings of TALN 2013 (Volume 2: Short Papers)


The Quaero Evaluation Initiative on Term Extraction
Thibault Mondary | Adeline Nazarenko | Haïfa Zargayouna | Sabine Barreaux
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The Quaero program has organized a set of evaluations for terminology extraction systems in 2010 and 2011. Three objectives were targeted in this initiative: the first one was to evaluate the behavior and scalability of term extractors regarding the size of corpora, the second goal was to assess progress between different versions of the same systems, the last one was to measure the influence of corpus type. The protocol used during this initiative was a comparative analysis of 32 runs against a gold standard. Scores were computed using metrics that take into account gradual relevance. Systems produced by Quaero partners and publicly available systems were evaluated on pharmacology corpora composed of European Patents or abstracts of scientific articles, all in English. The gold standard was an unstructured version of the pharmacology thesaurus used by INIST-CNRS for indexing purposes. Most systems scaled with large corpora, contrasted differences were observed between different versions of the same systems and with better results on scientific articles than on patents. During the ongoing adjudication phase domain experts are enriching the thesaurus with terms found by several systems.


Evaluation of Textual Knowledge Acquisition Tools: a Challenging Task
Haïfa Zargayouna | Adeline Nazarenko
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

A large effort has been devoted to the development of textual knowledge acquisition (KA) tools, but it is still difficult to assess the progress that has been made. The results produced by these tools are difficult to compare, due to the heterogeneity of the proposed methods and of their goals. Various experiments have been made to evaluate terminological and ontological tools. They show that in terminology as well as in ontology acquisition, it remains difficult to compare existing tools and to analyse their advantages and drawbacks. From our own experiments in evaluating terminology and ontology acquisition tools, it appeared that the difficulties and solutions are similar for both tasks. We propose a unified approach for the evaluation of textual KA tools that can be instantiated in different ways for various tasks. The main originality of this approach lies in the way it takes into account the subjectivity of evaluation and the relativity of gold standards. In this paper, we highlight the major difficulties of KA evaluation, we then present a unified proposal for the evaluation of terminologies and ontologies acquisition tools and the associated experiments. The proposed protocols take into consideration the specificity of this type of evaluation.


Evaluating Term Extraction
Adeline Nazarenko | Haïfa Zargayouna
Proceedings of the International Conference RANLP-2009