André Bittar

2018

For psychiatric disorders such as schizophrenia, longer durations of untreated psychosis are associated with worse intervention outcomes. Data included in electronic health records (EHRs) can be useful for retrospective clinical studies, but much of this is stored as unstructured text which cannot be directly used in computation. Natural Language Processing (NLP) methods can be used to extract this data, in order to identify symptoms and treatments from mental health records, and temporally anchor the first emergence of these. We are developing an EHR corpus annotated with time expressions, clinical entities and their relations, to be used for NLP development. In this study, we focus on the first step, identifying time expressions in EHRs for patients with schizophrenia. We developed a gold standard corpus, compared this corpus to other related corpora in terms of content and time expression prevalence, and adapted two NLP systems for extracting time expressions. To the best of our knowledge, this is the first resource annotated for temporal entities in the mental health domain.

2016

pdf
CENTAL at SemEval-2016 Task 12: a linguistically fed CRF model for medical and temporal information extraction
Charlotte Hansart | Damien De Meyere | Patrick Watrin | André Bittar | Cédrick Fairon
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf abs
Emotion Analysis on Twitter: The Hidden Challenge
Luca Dini | André Bittar
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present an experiment to detect emotions in tweets. Unlike much previous research, we draw the important distinction between the tasks of emotion detection in a closed world assumption (i.e. every tweet is emotional) and the complicated task of identifying emotional versus non-emotional tweets. Given an apparent lack of appropriately annotated data, we created two corpora for these tasks. We describe two systems, one symbolic and one based on machine learning, which we evaluated on our datasets. Our evaluation shows that a machine learning classifier performs best on emotion detection, while a symbolic approach is better for identifying relevant (i.e. emotional) tweets.

2015

pdf abs
Un système expert fondé sur une analyse sémantique pour l’identification de menaces d’ordre biologique
Cédric Lopez | Aleksandra Ponomareva | Cécile Robin | André Bittar | Xabier Larrucea | Frédérique Segond | Marie-Hélène Metzger
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

Le projet européen TIER (Integrated strategy for CBRN – Chemical, Biological, Radiological and Nuclear – Threat Identification and Emergency Response) vise à intégrer une stratégie complète et intégrée pour la réponse d’urgence dans un contexte de dangers biologiques, chimiques, radiologiques, nucléaires, ou liés aux explosifs, basée sur l’identification des menaces et d’évaluation des risques. Dans cet article, nous nous focalisons sur les risques biologiques. Nous présentons notre système expert fondé sur une analyse sémantique, permettant l’extraction de données structurées à partir de données non structurées dans le but de raisonner.

2014

pdf abs
The Dangerous Myth of the Star System
André Bittar | Luca Dini | Sigrid Maurel | Mathieu Ruhlmann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In recent years we have observed two parallel trends in computational linguistics research and e-commerce development. On the research side, there has been an increasing interest in algorithms and approaches that are able to capture the polarity of opinions expressed by users on products, institutions and services. On the other hand, almost all big e-commerce and aggregator sites are by now providing users the possibility of writing comments and expressing their appreciation with a numeric score (usually represented as a number of stars). This generates the impression that the work carried out in the research community is made partially useless (at least for economic exploitation) by an evolution in web practices. In this paper we describe an experiment on a large corpus which shows that the score judgments provided by users are often conflicting with the text contained in the opinion, and to such a point that a rule-based opinion mining system can be demonstrated to perform better than the users themselves in ranking their opinions.

2012

pdf
Finding Salient Dates for Building Thematic Timelines
Rémy Kessler | Xavier Tannier | Caroline Hagège | Véronique Moriceau | André Bittar
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Un annotateur automatique d’expressions temporelles du français et son évaluation sur le TimeBank du français (An Automatic Temporal Expression Annotator and its Evaluation on the French TimeBank) [in French]
André Bittar | Caroline Hagège
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf abs
Temporal Annotation: A Proposal for Guidelines and an Experiment with Inter-annotator Agreement
André Bittar | Caroline Hagège | Véronique Moriceau | Xavier Tannier | Charles Teissèdre
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This article presents work carried out within the framework of the ongoing ANR (French National Research Agency) project Chronolines, which focuses on the temporal processing of large news-wire corpora in English and French. The aim of the project is to create new and innovative interfaces for visualizing textual content according to temporal criteria. Extracting and normalizing the temporal information in texts through linguistic annotation is an essential step towards attaining this objective. With this goal in mind, we developed a set of guidelines for the annotation of temporal and event expressions that is intended to be compatible with the TimeML markup language, while addressing some of its pitfalls. We provide results of an initial application of these guidelines to real news-wire texts in French over several iterations of the annotation process. These results include inter-annotator agreement figures and an error analysis. Our final inter-annotator agreement figures compare favorably with those reported for the TimeBank 1.2 annotation project.

2011

pdf abs
French TimeBank : un corpus de référence sur la temporalité en français (French TimeBank: a reference corpus on temporality in French)
André Bittar | Pascal Amsili | Pascal Denis
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article a un double objectif : d’une part, il s’agit de présenter à la communauté un corpus récemment rendu public, le French Time Bank (FTiB), qui consiste en une collection de textes journalistiques annotés pour les temps et les événements selon la norme ISO-TimeML ; d’autre part, nous souhaitons livrer les résultats et réflexions méthodologiques que nous avons pu tirer de la réalisation de ce corpus de référence, avec l’idée que notre expérience pourra s’avérer profitable au-delà de la communauté intéressée par le traitement de la temporalité.

pdf
French TimeBank: An ISO-TimeML Annotated Reference Corpus
André Bittar | Pascal Amsili | Pascal Denis | Laurence Danlos
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf
Annotation of Events and Temporal Expressions in French Texts
André Bittar
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf abs
Intégration des constructions à verbe support dans TimeML
André Bittar | Laurence Danlos
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Le langage TimeML a été conçu pour l’annotation des informations temporelles dans les textes, notamment les événements, les expressions de temps et les relations entre les deux. Des consignes d’annotation générales ont été élaborées afin de guider l’annotateur dans cette tâche, mais certains phénomènes linguistiques restent à traiter en détail. Un problème commun dans les tâches de TAL, que ce soit en traduction, en génération ou en compréhension, est celui de l’encodage des constructions à verbe support. Relativement peu d’attention a été portée, jusqu’à maintenant, sur ce problème dans le cadre du langage TimeML. Dans cet article, nous proposons des consignes d’annotation pour les constructions à verbe support.

2008

pdf bib abs
Annotation des informations temporelles dans des textes en français
André Bittar
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

Le traitement des informations temporelles est crucial pour la compréhension de textes en langue naturelle. Le langage de spécification TimeML a été conçu afin de permettre le repérage et la normalisation des expressions temporelles et des événements dans des textes écrits en anglais. L’objectif des divers projets TimeML a été de formuler un schéma d’annotation pouvant s’appliquer à du texte libre, comme ce que l’on trouve sur le Web, par exemple. Des efforts ont été faits pour l’application de TimeML à d’autres langues que l’anglais, notamment le chinois, le coréen, l’italien, l’espagnol et l’allemand. Pour le français, il y a eu des efforts allant dans ce sens, mais ils sont encore un peu éparpillés. Dans cet article, nous détaillons nos travaux actuels qui visent à élaborer des ressources complètes pour l’annotation de textes en français selon TimeML - notamment un guide d’annotation, un corpus de référence (Gold Standard) et des modules d’annotation automatique.

Co-authors

Venues

semeval1

starsem1