Farah Benamara

Also published as: Farah Beanamara, Farah Benamara Zitoune


2024

pdf
Premier système IRIT-MyFamillyUp pour la compétition sur la reconnaissance des émotions Odyssey 2024
Adrien Lafore | Clément Pagès | Leila Moudjari | Sebastiao Quintas | Isabelle Ferrané | Hervé Bredin | Thomas Pellegrini | Farah Benamara | Jérôme Bertrand | Marie-Françoise Bertrand | Véronique Moriceau | Jérôme Farinas
Actes des 35èmes Journées d'Études sur la Parole

Dans cet article, nous présentons notre contribution à la tâche de classification des émotions dans la parole dans le cadre de notre participation à la campagne d’évaluation Odyssey 2024. Nous proposons un système hybride qui tire parti à la fois des informations du signal audio et des informations sémantiques issues des transcriptions automatiques. Les résultats montrent que l’ajout de l’information sémantique permet de dépasser les systèmes uniquement audio.

pdf
Humans Need Context, What about Machines? Investigating Conversational Context in Abusive Language Detection
Tom Bourgeade | Zongmin Li | Farah Benamara | Véronique Moriceau | Jian Su | Aixin Sun
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

A crucial aspect in abusive language on social media platforms (toxicity, hate speech, harmful stereotypes, etc.) is its inherent contextual nature. In this paper, we focus on the role of conversational context in abusive language detection, one of the most “direct” forms of context in this domain, as given by the conversation threads (e.g., directly preceding message, original post). The incorporation of surrounding messages has proven vital for the accurate human annotation of harmful content. However, many prior works have either ignored this aspect, collecting and processing messages in isolation, or have obtained inconsistent results when attempting to embed such contextual information into traditional classification methods. The reasons behind these findings have not yet been properly addressed. To this end, we propose an analysis of the impact of conversational context in abusive language detection, through: (1) an analysis of prior works and the limitations of the most common concatenation-based approach, which we attempt to address with two alternative architectures; (2) an evaluation of these methods on existing datasets in English, and a new dataset of French tweets annotated for hate speech and stereotypes; and (3) a qualitative analysis showcasing the necessity for context-awareness in ALD, but also its difficulties.

2023

pdf
A Multilingual Dataset of Racial Stereotypes in Social Media Conversational Threads
Tom Bourgeade | Alessandra Teresa Cignarella | Simona Frenda | Mario Laurent | Wolfgang Schmeisser-Nieto | Farah Benamara | Cristina Bosco | Véronique Moriceau | Viviana Patti | Mariona Taulé
Findings of the Association for Computational Linguistics: EACL 2023

In this paper, we focus on the topics of misinformation and racial hoaxes from a perspective derived from both social psychology and computational linguistics. In particular, we consider the specific case of anti-immigrant feeling as a first case study for addressing racial stereotypes. We describe the first corpus-based study for multilingual racial stereotype identification in social media conversational threads. Our contributions are: (i) a multilingual corpus of racial hoaxes, (ii) a set of common guidelines for the annotation of racial stereotypes in social media texts, and a multi-layered, fine-grained scheme, psychologically grounded on the work by Fiske, including not only stereotype presence, but also contextuality, implicitness, and forms of discredit, (iii) a multilingual dataset in Italian, Spanish, and French annotated following the aforementioned guidelines, and cross-lingual comparative analyses taking into account racial hoaxes and stereotypes in online discussions. The analysis and results show the usefulness of our methodology and resources, shedding light on how racial hoaxes are spread, and enable the identification of negative stereotypes that reinforce them.

pdf
What Did You Learn To Hate? A Topic-Oriented Analysis of Generalization in Hate Speech Detection
Tom Bourgeade | Patricia Chiril | Farah Benamara | Véronique Moriceau
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Hate speech has unfortunately become a significant phenomenon on social media platforms, and it can cover various topics (misogyny, sexism, racism, xenophobia, etc.) and targets (e.g., black people, women). Various hate speech detection datasets have been proposed, some annotated for specific topics, and others for hateful speech in general. In either case, they often employ different annotation guidelines, which can lead to inconsistencies, even in datasets focusing on the same topics. This can cause issues in models trying to generalize across more data and more topics in order to improve detection accuracy. In this paper, we propose, for the first time, a topic-oriented approach to study generalization across popular hate speech datasets. We first perform a comparative analysis of the performances of Transformer-based models in capturing topic-generic and topic-specific knowledge when trained on different datasets. We then propose a novel, simple yet effective approach to study more precisely which topics are best captured in implicit manifestations of hate, showing that selecting combinations of datasets with better out-of-domain topical coverage improves the reliability of automatic hate speech detection.

pdf
Classification de tweets en situation d’urgence pour la gestion de crises
Romain Meunier | Leila Moudjari | Farah Benamara | Véronique Moriceau | Alda Mari | Patricia Stolf
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs

Le traitement de données provenant de réseaux sociaux en temps réel est devenu une outil attractifdans les situations d’urgence, mais la surcharge d’informations reste un défi à relever. Dans cet article,nous présentons un nouveau jeu de données en français annoté manuellement pour la gestion de crise.Nous testons également plusieurs modèles d’apprentissage automatique pour classer des tweets enfonction de leur pertinence, de l’urgence et de l’intention qu’ils véhiculent afin d’aider au mieux lesservices de secours durant les crises selon des méthodes d’évaluation spécifique à la gestion de crise.Nous évaluons également nos modèles lorsqu’ils sont confrontés à de nouvelles crises ou même denouveaux types de crises, avec des résultats encourageants

2022

pdf
How Can a Teacher Make Learning From Sparse Data Softer? Application to Business Relation Extraction
Hadjer Khaldi | Farah Benamara | Camille Pradel | Nathalie Aussenac-Gilles
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Business Relation Extraction between market entities is a challenging information extraction task that suffers from data imbalance due to the over-representation of negative relations (also known as No-relation or Others) compared to positive relations that corresponds to the taxonomy of relations of interest. This paper proposes a novel solution to tackle this problem, relying on binary soft labels supervision generated by an approach based on knowledge distillation. When evaluated on a business relation extraction dataset, the results suggest that the proposed approach improves the overall performance, beating state-of-the art solutions for data imbalance. In particular, it improves the extraction of under-represented relations as well as the detection of false negatives.

pdf
Speech acts and Communicative Intentions for Urgency Detection
Laurenti Enzo | Bourgon Nils | Farah Benamara | Mari Alda | Véronique Moriceau | Courgeon Camille
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

Recognizing speech acts (SA) is crucial for capturing meaning beyond what is said, making communicative intentions particularly relevant to identify urgent messages. This paper attempts to measure for the first time the impact of SA on urgency detection during crises,006in tweets. We propose a new dataset annotated for both urgency and SA, and develop several deep learning architectures to inject SA into urgency detection while ensuring models generalisability. Our results show that taking speech acts into account in tweet analysis improves information type detection in an out-of-type configuration where models are evaluated in unseen event types during training. These results are encouraging and constitute a first step towards SA-aware disaster management in social media.

pdf
Automatic Detection of Stigmatizing Uses of Psychiatric Terms on Twitter
Véronique Moriceau | Farah Benamara | Abdelmoumene Boumadane
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Psychiatry and people suffering from mental disorders have often been given a pejorative label that induces social rejection. Many studies have addressed discourse content about psychiatry on social media, suggesting that they convey stigmatizingrepresentations of mental health disorders. In this paper, we focus for the first time on the use of psychiatric terms in tweetsin French. We first describe the annotated dataset that we use. Then we propose several deep learning models to detectautomatically (1) the different types of use of psychiatric terms (medical use, misuse or irrelevant use), and (2) the polarityof the tweet. We show that polarity detection can be improved when done in a multitask framework in combination with typeof use detection. This confirms the observations made manually on several datasets, namely that the polarity of a tweet iscorrelated to the type of term use (misuses are mostly negative whereas medical uses are neutral). The results are interesting forboth tasks and it allows to consider the possibility for performant automatic approaches in order to conduct real-time surveyson social media, larger and less expensive than existing manual ones

pdf
How’s Business Going Worldwide ? A Multilingual Annotated Corpus for Business Relation Extraction
Hadjer Khaldi | Farah Benamara | Camille Pradel | Grégoire Sigel | Nathalie Aussenac-Gilles
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The business world has changed due to the 21st century economy, where borders have melted and trades became free. Nowadays,competition is no longer only at the local market level but also at the global level. In this context, the World Wide Web has become a major source of information for companies and professionals to keep track of their complex, rapidly changing, and competitive business environment. A lot of effort is nonetheless needed to collect and analyze this information due to information overload problem and the huge number of web pages to process and analyze. In this paper, we propose the BizRel resource, the first multilingual (French,English, Spanish, and Chinese) dataset for automatic extraction of binary business relations involving organizations from the web. This dataset is used to train several monolingual and cross-lingual deep learning models to detect these relations in texts. Our results are encouraging, demonstrating the effectiveness of such a resource for both research and business communities. In particular, we believe multilingual business relation extraction systems are crucial tools for decision makers to identify links between specific market stakeholders and build business networks which enable to anticipate changes and discover new threats or opportunities. Our work is therefore an important direction toward such tools.

pdf
Give me your Intentions, I’ll Predict our Actions: A Two-level Classification of Speech Acts for Crisis Management in Social Media
Enzo Laurenti | Nils Bourgon | Farah Benamara | Alda Mari | Véronique Moriceau | Camille Courgeon
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Discovered by (Austin,1962) and extensively promoted by (Searle, 1975), speech acts (SA) have been the object of extensive discussion in the philosophical and the linguistic literature, as well as in computational linguistics where the detection of SA have shown to be an important step in many down stream NLP applications. In this paper, we attempt to measure for the first time the role of SA on urgency detection in tweets, focusing on natural disasters. Indeed, SA are particularly relevant to identify intentions, desires, plans and preferences towards action, providing therefore actionable information that will help to set priorities for the human teams and decide appropriate rescue actions. To this end, we come up here with four main contributions: (1) A two-layer annotation scheme of SA both at the tweet and subtweet levels, (2) A new French dataset of 6,669 tweets annotated for both urgency and SA, (3) An in-depth analysis of the annotation campaign, highlighting the correlation between SA and urgency categories, and (4) A set of deep learning experiments to detect SA in a crisis corpus. Our results show that SA are correlated with urgency which is a first important step towards SA-aware NLP-based crisis management on social media.

2021

pdf
“Be nice to your wife! The restaurants are closed”: Can Gender Stereotype Detection Improve Sexism Classification?
Patricia Chiril | Farah Benamara | Véronique Moriceau
Findings of the Association for Computational Linguistics: EMNLP 2021

In this paper, we focus on the detection of sexist hate speech against women in tweets studying for the first time the impact of gender stereotype detection on sexism classification. We propose: (1) the first dataset annotated for gender stereotype detection, (2) a new method for data augmentation based on sentence similarity with multilingual external datasets, and (3) a set of deep learning experiments first to detect gender stereotypes and then, to use this auxiliary task for sexism detection. Although the presence of stereotypes does not necessarily entail hateful content, our results show that sexism classification can definitively benefit from gender stereotype detection.

2020

pdf
Multilingual Irony Detection with Dependency Syntax and Neural Models
Alessandra Teresa Cignarella | Valerio Basile | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso | Farah Benamara
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

pdf
An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis
Leila Moudjari | Karima Akli-Astouati | Farah Benamara
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we address the lack of resources for opinion and emotion analysis related to North African dialects, targeting Algerian dialect. We present TWIFIL (TWItter proFILing) a collaborative annotation platform for crowdsourcing annotation of tweets at different levels of granularity. The plateform allowed the creation of the largest Algerian dialect dataset annotated for both sentiment (9,000 tweets), emotion (about 5,000 tweets) and extra-linguistic information including author profiling (age and gender). The annotation resulted also in the creation of the largest Algerien dialect subjectivity lexicon of about 9,000 entries which can constitute a valuable resources for the development of future NLP applications for Algerian dialect. To test the validity of the dataset, a set of deep learning experiments were conducted to classify a given tweet as positive, negative or neutral. We discuss our results and provide an error analysis to better identify classification errors.

pdf
An Annotated Corpus for Sexism Detection in French Tweets
Patricia Chiril | Véronique Moriceau | Farah Benamara | Alda Mari | Gloria Origgi | Marlène Coulomb-Gully
Proceedings of the Twelfth Language Resources and Evaluation Conference

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper presents the first French corpus annotated for sexism detection composed of about 12,000 tweets. In a context of offensive content mediation on social media now regulated by European laws, we think that it is important to be able to detect automatically not only sexist content but also to identify if a message with a sexist content is really sexist (i.e. addressed to a woman or describing a woman or women in general) or is a story of sexism experienced by a woman. This point is the novelty of our annotation scheme. We also propose some preliminary results for sexism detection obtained with a deep learning approach. Our experiments show encouraging results.

pdf
He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist
Patricia Chiril | Véronique Moriceau | Farah Benamara | Alda Mari | Gloria Origgi | Marlène Coulomb-Gully
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In a context of offensive content mediation on social media now regulated by European laws, it is important not only to be able to automatically detect sexist content but also to identify if a message with a sexist content is really sexist or is a story of sexism experienced by a woman. We propose: (1) a new characterization of sexist content inspired by speech acts theory and discourse analysis studies, (2) the first French dataset annotated for sexism detection, and (3) a set of deep learning experiments trained on top of a combination of several tweet’s vectorial representations (word embeddings, linguistic features, and various generalization strategies). Our results are encouraging and constitute a first step towards offensive content moderation.

pdf
Classification de relations pour l’intelligence économique et concurrentielle (Relation Classification for Competitive and Economic Intelligence )
Hadjer Khaldi | Amine Abdaoui | Farah Benamara | Grégoire Sigel | Nathalie Aussenac-Gilles
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

L’extraction de relations reliant des entités par des liens sémantiques à partir de texte a fait l’objet de nombreux travaux visant à extraire des relations génériques comme l’hyperonymie ou spécifiques comme des relations entre gènes et protéines. Dans cet article, nous nous intéressons aux relations économiques entre deux entités nommées de type organisation à partir de textes issus du web. Ce type de relation, encore peu étudié dans la littérature, a pour but l’identification des liens entre les acteurs d’un secteur d’activité afin d’analyser leurs écosystèmes économiques. Nous présentons B IZ R EL, le premier corpus français annoté en relations économiques, ainsi qu’une approche supervisée à base de différentes architectures neuronales pour la classification de ces relations. L’évaluation de ces modèles montre des résultats très encourageants, ce qui est un premier pas vers l’intelligence économique et concurrentielle à partir de textes pour le français.

2019

pdf
The binary trio at SemEval-2019 Task 5: Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Abhishek Kumar
Proceedings of the 13th International Workshop on Semantic Evaluation

The massive growth of user-generated web content through blogs, online forums and most notably, social media networks, led to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection towards immigrants and women in English tweets. Several models have been developed ranging from feature-engineering approaches to neural ones.

pdf
Multilingual and Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Marlène Coulomb-Gully | Abhishek Kumar
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection from a multilingual perspective. We focus in particular on hateful messages towards two different targets (immigrants and women) in English tweets, as well as sexist messages in both English and French. Several models have been developed ranging from feature-engineering approaches to neural ones. Our experiments show very encouraging results on both languages.

2018

pdf
Introduction to the Special Issue on Language in Social Media: Exploiting Discourse and Other Contextual Information
Farah Benamara | Diana Inkpen | Maite Taboada
Computational Linguistics, Volume 44, Issue 4 - December 2018

Social media content is changing the way people interact with each other and share information, personal messages, and opinions about situations, objects, and past experiences. Most social media texts are short online conversational posts or comments that do not contain enough information for natural language processing (NLP) tools, as they are often accompanied by non-linguistic contextual information, including meta-data (e.g., the user’s profile, the social network of the user, and their interactions with other users). Exploiting such different types of context and their interactions makes the automatic processing of social media texts a challenging research task. Indeed, simply applying traditional text mining tools is clearly sub-optimal, as, typically, these tools take into account neither the interactive dimension nor the particular nature of this data, which shares properties with both spoken and written language. This special issue contributes to a deeper understanding of the role of these interactions to process social media data from a new perspective in discourse interpretation. This introduction first provides the necessary background to understand what context is from both the linguistic and computational linguistic perspectives, then presents the most recent context-based approaches to NLP for social media. We conclude with an overview of the papers accepted in this special issue, highlighting what we believe are the future directions in processing social media texts.

2017

pdf
Evaluative Language Beyond Bags of Words: Linguistic Insights and Computational Applications
Farah Benamara | Maite Taboada | Yannick Mathieu
Computational Linguistics, Volume 43, Issue 1 - April 2017

The study of evaluation, affect, and subjectivity is a multidisciplinary enterprise, including sociology, psychology, economics, linguistics, and computer science. A number of excellent computational linguistics and linguistic surveys of the field exist. Most surveys, however, do not bring the two disciplines together to show how methods from linguistics can benefit computational sentiment analysis systems. In this survey, we show how incorporating linguistic insights, discourse information, and other contextual phenomena, in combination with the statistical exploitation of data, can result in an improvement over approaches that take advantage of only one of these perspectives. We first provide a comprehensive introduction to evaluative language from both a linguistic and computational perspective. We then argue that the standard computational definition of the concept of evaluative language neglects the dynamic nature of evaluation, in which the interpretation of a given evaluation depends on linguistic and extra-linguistic contextual factors. We thus propose a dynamic definition that incorporates update functions. The update functions allow for different contextual aspects to be incorporated into the calculation of sentiment for evaluative words or expressions, and can be applied at all levels of discourse. We explore each level and highlight which linguistic aspects contribute to accurate extraction of sentiment. We end the review by outlining what we believe the future directions of sentiment analysis are, and the role that discourse and contextual information need to play.

pdf
Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study
Jihen Karoui | Farah Benamara | Véronique Moriceau | Viviana Patti | Cristina Bosco | Nathalie Aussenac-Gilles
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper provides a linguistic and pragmatic analysis of the phenomenon of irony in order to represent how Twitter’s users exploit irony devices within their communication strategies for generating textual contents. We aim to measure the impact of a wide-range of pragmatic phenomena in the interpretation of irony, and to investigate how these phenomena interact with contexts local to the tweet. Informed by linguistic theories, we propose for the first time a multi-layered annotation schema for irony and its application to a corpus of French, English and Italian tweets. We detail each layer, explore their interactions, and discuss our results according to a qualitative and quantitative perspective.

2015

pdf
Mapping Different Rhetorical Relation Annotations: A Proposal
Farah Benamara | Maite Taboada
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf
Towards a Contextual Pragmatic Model to Detect Irony in Tweets
Jihen Karoui | Farah Benamara Zitoune | Véronique Moriceau | Nathalie Aussenac-Gilles | Lamia Hadrich Belguith
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf
Developing a French FrameNet: Methodology and First results
Marie Candito | Pascal Amsili | Lucie Barque | Farah Benamara | Gaël de Chalendar | Marianne Djemaa | Pauline Haas | Richard Huyghe | Yvette Yannick Mathieu | Philippe Muller | Benoît Sagot | Laure Vieu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the first part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain.

pdf
Fine-grained semantic categorization of opinion expressions for consensus detection (Catégorisation sémantique fine des expressions d’opinion pour la détection de consensus) [in French]
Farah Benamara | Véronique Moriceau | Yvette Yannick Mathieu
TALN-RECITAL 2014 Workshop DEFT 2014 : DÉfi Fouille de Textes (DEFT 2014 Workshop: Text Mining Challenge)

2013

pdf
Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game
Anaïs Cadilhac | Nicholas Asher | Farah Benamara | Alex Lascarides
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Sentiment Composition Using a Parabolic Model
Baptiste Chardon | Farah Benamara | Yannick Mathieu | Vladimir Popescu | Nicholas Asher
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf
Segmenting Arabic Texts into Elementary Discourse Units (Segmentation de textes arabes en unités discursives minimales) [in French]
Iskandar Keskes | Farah Beanamara | Lamia Hadrich Belguith
Proceedings of TALN 2013 (Volume 1: Long Papers)

2012

pdf
Extraction de préférences à partir de dialogues de négociation (Towards Preference Extraction From Negotiation Dialogues) [in French]
Anaïs Cadilhac | Farah Benamara | Vladimir Popescu | Nicholas Asher | Mohamadou Seck
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf
Annotating Preferences in Chats for Strategic Games
Anaïs Cadilhac | Nicholas Asher | Farah Benamara
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
How do Negation and Modality Impact on Opinions?
Farah Benamara | Baptiste Chardon | Yannick Mathieu | Vladimir Popescu | Nicholas Asher
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

pdf
An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus
Stergos Afantenos | Nicholas Asher | Farah Benamara | Myriam Bras | Cécile Fabre | Mai Ho-dac | Anne Le Draoulec | Philippe Muller | Marie-Paule Péry-Woodley | Laurent Prévot | Josette Rebeyrolles | Ludovic Tanguy | Marianne Vergez-Couret | Laure Vieu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.

pdf
Clause-based Discourse Segmentation of Arabic Texts
Iskandar Keskes | Farah Benamara | Lamia Hadrich Belguith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.

pdf
Annotating Preferences in Negotiation Dialogues
Anaïs Cadilhac | Nicholas Asher | Farah Benamara
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf
Commitments to Preferences in Dialogue
Anais Cadilhac | Nicholas Asher | Farah Benamara | Alex Lascarides
Proceedings of the SIGDIAL 2011 Conference

pdf
Towards Context-Based Subjectivity Analysis
Farah Benamara | Baptiste Chardon | Yannick Mathieu | Vladimir Popescu
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
Ontolexical resources for feature-based opinion mining: a case-study
Anaïs Cadilhac | Farah Benamara | Nathalie Aussenac-Gilles
Proceedings of the 6th Workshop on Ontologies and Lexical Resources

2009

pdf
ANNODIS: une approche outillée de l’annotation de structures discursives
Marie-Paule Péry-Woodley | Nicholas Asher | Patrice Enjalbert | Farah Benamara | Myriam Bras | Cécile Fabre | Stéphane Ferrari | Lydia-Mai Ho-Dac | Anne Le Draoulec | Yann Mathet | Philippe Muller | Laurent Prévot | Josette Rebeyrolle | Ludovic Tanguy | Marianne Vergez-Couret | Laure Vieu | Antoine Widlöcher
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Le projet ANNODIS vise la construction d’un corpus de textes annotés au niveau discursif ainsi que le développement d’outils pour l’annotation et l’exploitation de corpus. Les annotations adoptent deux points de vue complémentaires : une perspective ascendante part d’unités de discours minimales pour construire des structures complexes via un jeu de relations de discours ; une perspective descendante aborde le texte dans son entier et se base sur des indices pré-identifiés pour détecter des structures discursives de haut niveau. La construction du corpus est associée à la création de deux interfaces : la première assiste l’annotation manuelle des relations et structures discursives en permettant une visualisation du marquage issu des prétraitements ; une seconde sera destinée à l’exploitation des annotations. Nous présentons les modèles et protocoles d’annotation élaborés pour mettre en oeuvre, au travers de l’interface dédiée, la campagne d’annotation.

2008

pdf bib
Distilling Opinion in Discourse: A Preliminary Study
Nicholas Asher | Farah Benamara | Yvette Yannick Mathieu
Coling 2008: Companion volume: Posters

2007

pdf bib
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues
Farah Benamara | Sylwia Ozdowska
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues

pdf bib
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (Posters)
Farah Benamara | Sylwia Ozdowska
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (Posters)

2006

pdf bib
Language and Reasoning for Question Answering: State of the Artand Future Directions
Farah Benamara
Proceedings of the Workshop KRAQ’06: Knowledge and Reasoning for Language Processing

2004

pdf bib
COOPML: Towards Annotating Cooperative Discourse
Farah Benamara | Veronique Moriceau | Patrick Saint-Dizier
Proceedings of the Workshop on Discourse Annotation

pdf
Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment
Farah Benamara
Proceedings of the Conference on Question Answering in Restricted Domains

pdf
Lexicalisation strategies in cooperative question-answering systems
Farah Benamara | Patrick Saint-Dizier
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf
WEBCOOP: A Cooperative Question Answering System on the Web
Farah Benamara | Patrick Saint Dizier
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Dynamic Generation of Cooperative Natural Language Responses in WEBCOOP
Farah Benamara | Patrick Saint Dizier
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003

Search