Farah Benamara

Also published as: Farah Beanamara, Farah Benamara Zitoune

2021

pdf bib abs
“Be nice to your wife! The restaurants are closed”: Can Gender Stereotype Detection Improve Sexism Classification?
Patricia Chiril | Farah Benamara | Véronique Moriceau
Findings of the Association for Computational Linguistics: EMNLP 2021

In this paper, we focus on the detection of sexist hate speech against women in tweets studying for the first time the impact of gender stereotype detection on sexism classification. We propose: (1) the first dataset annotated for gender stereotype detection, (2) a new method for data augmentation based on sentence similarity with multilingual external datasets, and (3) a set of deep learning experiments first to detect gender stereotypes and then, to use this auxiliary task for sexism detection. Although the presence of stereotypes does not necessarily entail hateful content, our results show that sexism classification can definitively benefit from gender stereotype detection.

2020

pdf bib abs
Classification de relations pour l’intelligence économique et concurrentielle (Relation Classification for Competitive and Economic Intelligence )
Hadjer Khaldi | Amine Abdaoui | Farah Benamara | Grégoire Sigel | Nathalie Aussenac-Gilles
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

L’extraction de relations reliant des entités par des liens sémantiques à partir de texte a fait l’objet de nombreux travaux visant à extraire des relations génériques comme l’hyperonymie ou spécifiques comme des relations entre gènes et protéines. Dans cet article, nous nous intéressons aux relations économiques entre deux entités nommées de type organisation à partir de textes issus du web. Ce type de relation, encore peu étudié dans la littérature, a pour but l’identification des liens entre les acteurs d’un secteur d’activité afin d’analyser leurs écosystèmes économiques. Nous présentons B IZ R EL, le premier corpus français annoté en relations économiques, ainsi qu’une approche supervisée à base de différentes architectures neuronales pour la classification de ces relations. L’évaluation de ces modèles montre des résultats très encourageants, ce qui est un premier pas vers l’intelligence économique et concurrentielle à partir de textes pour le français.

pdf bib abs
Multilingual Irony Detection with Dependency Syntax and Neural Models
Alessandra Teresa Cignarella | Valerio Basile | Manuela Sanguinetti | Cristina Bosco | Paolo Rosso | Farah Benamara
Proceedings of the 28th International Conference on Computational Linguistics

This paper presents an in-depth investigation of the effectiveness of dependency-based syntactic features on the irony detection task in a multilingual perspective (English, Spanish, French and Italian). It focuses on the contribution from syntactic knowledge, exploiting linguistic resources where syntax is annotated according to the Universal Dependencies scheme. Three distinct experimental settings are provided. In the first, a variety of syntactic dependency-based features combined with classical machine learning classifiers are explored. In the second scenario, two well-known types of word embeddings are trained on parsed data and tested against gold standard datasets. In the third setting, dependency-based syntactic features are combined into the Multilingual BERT architecture. The results suggest that fine-grained dependency-based syntactic information is informative for the detection of irony.

pdf bib abs
An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis
Leila Moudjari | Karima Akli-Astouati | Farah Benamara
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we address the lack of resources for opinion and emotion analysis related to North African dialects, targeting Algerian dialect. We present TWIFIL (TWItter proFILing) a collaborative annotation platform for crowdsourcing annotation of tweets at different levels of granularity. The plateform allowed the creation of the largest Algerian dialect dataset annotated for both sentiment (9,000 tweets), emotion (about 5,000 tweets) and extra-linguistic information including author profiling (age and gender). The annotation resulted also in the creation of the largest Algerien dialect subjectivity lexicon of about 9,000 entries which can constitute a valuable resources for the development of future NLP applications for Algerian dialect. To test the validity of the dataset, a set of deep learning experiments were conducted to classify a given tweet as positive, negative or neutral. We discuss our results and provide an error analysis to better identify classification errors.

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper presents the first French corpus annotated for sexism detection composed of about 12,000 tweets. In a context of offensive content mediation on social media now regulated by European laws, we think that it is important to be able to detect automatically not only sexist content but also to identify if a message with a sexist content is really sexist (i.e. addressed to a woman or describing a woman or women in general) or is a story of sexism experienced by a woman. This point is the novelty of our annotation scheme. We also propose some preliminary results for sexism detection obtained with a deep learning approach. Our experiments show encouraging results.

pdf bib abs
He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist
Patricia Chiril | Véronique Moriceau | Farah Benamara | Alda Mari | Gloria Origgi | Marlène Coulomb-Gully
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In a context of offensive content mediation on social media now regulated by European laws, it is important not only to be able to automatically detect sexist content but also to identify if a message with a sexist content is really sexist or is a story of sexism experienced by a woman. We propose: (1) a new characterization of sexist content inspired by speech acts theory and discourse analysis studies, (2) the first French dataset annotated for sexism detection, and (3) a set of deep learning experiments trained on top of a combination of several tweet’s vectorial representations (word embeddings, linguistic features, and various generalization strategies). Our results are encouraging and constitute a first step towards offensive content moderation.

2019

pdf bib abs
Multilingual and Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Marlène Coulomb-Gully | Abhishek Kumar
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Social media networks have become a space where users are free to relate their opinions and sentiments which may lead to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection from a multilingual perspective. We focus in particular on hateful messages towards two different targets (immigrants and women) in English tweets, as well as sexist messages in both English and French. Several models have been developed ranging from feature-engineering approaches to neural ones. Our experiments show very encouraging results on both languages.

pdf bib abs
The binary trio at SemEval-2019 Task 5: Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Abhishek Kumar
Proceedings of the 13th International Workshop on Semantic Evaluation

The massive growth of user-generated web content through blogs, online forums and most notably, social media networks, led to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection towards immigrants and women in English tweets. Several models have been developed ranging from feature-engineering approaches to neural ones.

2018

pdf bib abs
Introduction to the Special Issue on Language in Social Media: Exploiting Discourse and Other Contextual Information
Farah Benamara | Diana Inkpen | Maite Taboada
Computational Linguistics, Volume 44, Issue 4 - December 2018

Social media content is changing the way people interact with each other and share information, personal messages, and opinions about situations, objects, and past experiences. Most social media texts are short online conversational posts or comments that do not contain enough information for natural language processing (NLP) tools, as they are often accompanied by non-linguistic contextual information, including meta-data (e.g., the user’s profile, the social network of the user, and their interactions with other users). Exploiting such different types of context and their interactions makes the automatic processing of social media texts a challenging research task. Indeed, simply applying traditional text mining tools is clearly sub-optimal, as, typically, these tools take into account neither the interactive dimension nor the particular nature of this data, which shares properties with both spoken and written language. This special issue contributes to a deeper understanding of the role of these interactions to process social media data from a new perspective in discourse interpretation. This introduction first provides the necessary background to understand what context is from both the linguistic and computational linguistic perspectives, then presents the most recent context-based approaches to NLP for social media. We conclude with an overview of the papers accepted in this special issue, highlighting what we believe are the future directions in processing social media texts.

2017

pdf bib abs
Evaluative Language Beyond Bags of Words: Linguistic Insights and Computational Applications
Farah Benamara | Maite Taboada | Yannick Mathieu
Computational Linguistics, Volume 43, Issue 1 - April 2017

The study of evaluation, affect, and subjectivity is a multidisciplinary enterprise, including sociology, psychology, economics, linguistics, and computer science. A number of excellent computational linguistics and linguistic surveys of the field exist. Most surveys, however, do not bring the two disciplines together to show how methods from linguistics can benefit computational sentiment analysis systems. In this survey, we show how incorporating linguistic insights, discourse information, and other contextual phenomena, in combination with the statistical exploitation of data, can result in an improvement over approaches that take advantage of only one of these perspectives. We first provide a comprehensive introduction to evaluative language from both a linguistic and computational perspective. We then argue that the standard computational definition of the concept of evaluative language neglects the dynamic nature of evaluation, in which the interpretation of a given evaluation depends on linguistic and extra-linguistic contextual factors. We thus propose a dynamic definition that incorporates update functions. The update functions allow for different contextual aspects to be incorporated into the calculation of sentiment for evaluative words or expressions, and can be applied at all levels of discourse. We explore each level and highlight which linguistic aspects contribute to accurate extraction of sentiment. We end the review by outlining what we believe the future directions of sentiment analysis are, and the role that discourse and contextual information need to play.

pdf bib abs
Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study
Jihen Karoui | Farah Benamara | Véronique Moriceau | Viviana Patti | Cristina Bosco | Nathalie Aussenac-Gilles
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper provides a linguistic and pragmatic analysis of the phenomenon of irony in order to represent how Twitter’s users exploit irony devices within their communication strategies for generating textual contents. We aim to measure the impact of a wide-range of pragmatic phenomena in the interpretation of irony, and to investigate how these phenomena interact with contexts local to the tweet. Informed by linguistic theories, we propose for the first time a multi-layered annotation schema for irony and its application to a corpus of French, English and Italian tweets. We detail each layer, explore their interactions, and discuss our results according to a qualitative and quantitative perspective.

2015

pdf bib
Towards a Contextual Pragmatic Model to Detect Irony in Tweets
Jihen Karoui | Farah Benamara Zitoune | Véronique Moriceau | Nathalie Aussenac-Gilles | Lamia Hadrich Belguith
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Mapping Different Rhetorical Relation Annotations: A Proposal
Farah Benamara | Maite Taboada
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

2014

The Asfalda project aims to develop a French corpus with frame-based semantic annotations and automatic tools for shallow semantic analysis. We present the first part of the project: focusing on a set of notional domains, we delimited a subset of English frames, adapted them to French data when necessary, and developed the corresponding French lexicon. We believe that working domain by domain helped us to enforce the coherence of the resulting resource, and also has the advantage that, though the number of frames is limited (around a hundred), we obtain full coverage within a given domain.

pdf bib
Fine-grained semantic categorization of opinion expressions for consensus detection (Catégorisation sémantique fine des expressions d’opinion pour la détection de consensus) [in French]
Farah Benamara | Véronique Moriceau | Yvette Yannick Mathieu
TALN-RECITAL 2014 Workshop DEFT 2014 : DÉfi Fouille de Textes (DEFT 2014 Workshop: Text Mining Challenge)

2013

pdf bib
Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game
Anaïs Cadilhac | Nicholas Asher | Farah Benamara | Alex Lascarides
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Sentiment Composition Using a Parabolic Model
Baptiste Chardon | Farah Benamara | Yannick Mathieu | Vladimir Popescu | Nicholas Asher
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
Segmenting Arabic Texts into Elementary Discourse Units (Segmentation de textes arabes en unités discursives minimales) [in French]
Iskandar Keskes | Farah Beanamara | Lamia Hadrich Belguith
Proceedings of TALN 2013 (Volume 1: Long Papers)

2012

pdf bib
Extraction de préférences à partir de dialogues de négociation (Towards Preference Extraction From Negotiation Dialogues) [in French]
Anaïs Cadilhac | Farah Benamara | Vladimir Popescu | Nicholas Asher | Mohamadou Seck
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Annotating Preferences in Chats for Strategic Games
Anaïs Cadilhac | Nicholas Asher | Farah Benamara
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
How do Negation and Modality Impact on Opinions?
Farah Benamara | Baptiste Chardon | Yannick Mathieu | Vladimir Popescu | Nicholas Asher
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

pdf bib
Annotating Preferences in Negotiation Dialogues
Anaïs Cadilhac | Nicholas Asher | Farah Benamara
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.

pdf bib abs
Clause-based Discourse Segmentation of Arabic Texts
Iskandar Keskes | Farah Benamara | Lamia Hadrich Belguith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.

2011

pdf bib
Towards Context-Based Subjectivity Analysis
Farah Benamara | Baptiste Chardon | Yannick Mathieu | Vladimir Popescu
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Commitments to Preferences in Dialogue
Anais Cadilhac | Nicholas Asher | Farah Benamara | Alex Lascarides
Proceedings of the SIGDIAL 2011 Conference

2010

pdf bib
Ontolexical resources for feature-based opinion mining: a case-study
Anaïs Cadilhac | Farah Benamara | Nathalie Aussenac-Gilles
Proceedings of the 6th Workshop on Ontologies and Lexical Resources

2009

Le projet ANNODIS vise la construction d’un corpus de textes annotés au niveau discursif ainsi que le développement d’outils pour l’annotation et l’exploitation de corpus. Les annotations adoptent deux points de vue complémentaires : une perspective ascendante part d’unités de discours minimales pour construire des structures complexes via un jeu de relations de discours ; une perspective descendante aborde le texte dans son entier et se base sur des indices pré-identifiés pour détecter des structures discursives de haut niveau. La construction du corpus est associée à la création de deux interfaces : la première assiste l’annotation manuelle des relations et structures discursives en permettant une visualisation du marquage issu des prétraitements ; une seconde sera destinée à l’exploitation des annotations. Nous présentons les modèles et protocoles d’annotation élaborés pour mettre en oeuvre, au travers de l’interface dédiée, la campagne d’annotation.

Farah Benamara

2021

2020

2019

2018

2017

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2004

2003

Co-authors

Venues