Leila Kosseim

2024

pdf abs
CLaC at SemEval-2024 Task 4: Decoding Persuasion in Memes – An Ensemble of Language Models with Paraphrase Augmentation
Kota Shamanth Ramanath Nayak | Leila Kosseim
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes our approach to SemEval-2024 Task 4 subtask 1, focusing on hierarchical multi-label detection of persuasion techniques in meme texts. Our approach was based on fine-tuning individual language models (BERT, XLM-RoBERTa, and mBERT) and leveraging a mean-based ensemble model. Additional strategies included dataset augmentation through the TC dataset and paraphrase generation as well as the fine-tuning of individual classification thresholds for each class. During testing, our system outperformed the baseline in all languages except for Arabic, where no significant improvement was reached. Analysis of the results seem to indicate that our dataset augmentation strategy and per-class threshold fine-tuning may have introduced noise and exacerbated the dataset imbalance.

pdf abs
CLaC at SemEval-2024 Task 2: Faithful Clinical Trial Inference
Jennifer Marks | Mohammadreza Davari | Leila Kosseim
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper presents the methodology used for our participation in SemEval 2024 Task 2 (Jullien et al., 2024) – Safe Biomedical Natural Language Inference for Clinical Trials. The task involved Natural Language Inference (NLI) on clinical trial data, where statements were provided regarding information within Clinical Trial Reports (CTRs). These statements could pertain to a single CTR or compare two CTRs, requiring the identification of the inference relation (entailment vs contradiction) between CTR-statement pairs. Evaluation was based on F1, Faithfulness, and Consistency metrics, with priority given to the latter two by the organizers. Our approach aims to maximize Faithfulness and Consistency, guided by intuitive definitions provided by the organizers, without detailed metric calculations. Experimentally, our approach yielded models achieving maximal Faithfulness (top rank) and average Consistency (mid rank) at the expense of F1 (low rank). Future work will focus on refining our approach to achieve a balance among all three metrics.

pdf abs
Exploring Soft-Label Training for Implicit Discourse Relation Recognition
Nelson Filipe Costa | Leila Kosseim
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)

This paper proposes a classification model for single label implicit discourse relation recognition trained on soft-label distributions. It follows the PDTB 3.0 framework and it was trained and tested on the DiscoGeM corpus, where it achieves an F1-score of 51.38 on third-level sense classification of implicit discourse relations. We argue that training on soft-label distributions allows the model to better discern between more ambiguous discourse relations.

2023

pdf abs
CLaC at SemEval-2023 Task 3: Language Potluck RoBERTa Detects Online Persuasion Techniques in a Multilingual Setup
Nelson Filipe Costa | Bryce Hamilton | Leila Kosseim
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our approach to the SemEval-2023 Task 3 to detect online persuasion techniques in a multilingual setup. Our classification system is based on the RoBERTa-base model trained predominantly on English to label the persuasion techniques across 9 different languages. Our system was able to significantly surpass the baseline performance in 3 of the 9 languages: English, Georgian and Greek. However, our wrong assumption that a single classification system trained predominantly on English could generalize well to other languages, negatively impacted our scores on the other 6 languages. In this paper, we provide a description of the reasoning behind the development of our final model and what conclusions may be drawn from its performance for future work.

pdf abs
Mapping Explicit and Implicit Discourse Relations between the RST-DT and the PDTB 3.0
Nelson Filipe Costa | Nadia Sheikh | Leila Kosseim
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In this paper we propose a first empirical mapping between the RST-DT and the PDTB 3.0. We provide an original algorithm which allowed the mapping of 6,510 (80.0%) explicit and implicit discourse relations between the overlapping articles of the RST-DT and PDTB 3.0 discourse annotated corpora. Results of the mapping show that while it is easier to align segments of implicit discourse relations, the mapping obtained between the aligned explicit discourse relations is more unambiguous.

pdf abs
Discourse Analysis of Argumentative Essays of English Learners Based on CEFR Level
Blaise Hanel | Leila Kosseim
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

In this paper, we investigate the relationship between the use of discourse relations and the CEFR-level of argumentative English learner essays. Using both the Rhetorical Structure Theory (RST) and the Penn Discourse TreeBank (PDTB) frameworks, we analyze essays from The International Corpus Network of Asian Learners (ICNALE), and the Corpus and Repository of Writing (CROW). Results show that the use of the RST relations of Explanation and Background, as well as the first-level PDTB sense of Contingency, are influenced by the English proficiency level of the writer.

2022

pdf
Pre-training Language Models for Surface Realization
Farhood Farahnak | Leila Kosseim
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

pdf abs
How (Un)Faithful is Attention?
Hessam Amini | Leila Kosseim
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Although attention weights have been commonly used as a means to provide explanations for deep learning models, the approach has been widely criticized due to its lack of faithfulness. In this work, we present a simple approach to compute the newly proposed metric AtteFa, which can quantitatively represent the degree of faithfulness of the attention weights. Using this metric, we further validate the effect of the frequency of informative input elements and the use of contextual vs. non-contextual encoders on the faithfulness of the attention mechanism. Finally, we apply the approach on several real-life binary classification datasets to measure the faithfulness of attention weights in real-life settings.

2020

pdf abs
TIMBERT: Toponym Identifier For The Medical Domain Based on BERT
MohammadReza Davari | Leila Kosseim | Tien Bui
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we propose an approach to automate the process of place name detection in the medical domain to enable epidemiologists to better study and model the spread of viruses. We created a family of Toponym Identification Models based on BERT (TIMBERT), in order to learn in an end-to-end fashion the mapping from an input sentence to the associated sentence labeled with toponyms. When evaluated with the SemEval 2019 task 12 test set (Weissenbacher et al., 2019), our best TIMBERT model achieves an F1 score of 90.85%, a significant improvement compared to the state-of-the-art of 89.13% (Wang et al., 2019).

pdf abs
On the Creation of a Corpus for Coherence Evaluation of Discursive Units
Elham Mohammadi | Timothe Beiko | Leila Kosseim
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we report on our experiments towards the creation of a corpus for coherence evaluation. Most corpora for textual coherence evaluation are composed of randomly shuffled sentences that focus on sentence ordering, regardless of whether the sentences were originally related by a discourse relation. To the best of our knowledge, no publicly available corpus has been designed specifically for the evaluation of coherence of known discursive units. In this paper, we focus on coherence modeling at the intra-discursive level and describe our approach to build a corpus of incoherent pairs of sentences. We experimented with a variety of corruption strategies to create synthetic incoherent pairs of discourse arguments from coherent ones. Using discourse argument pairs from the Penn Discourse Tree Bank, we generate incoherent discourse argument pairs, by swapping either their discourse connective or a discourse argument. To evaluate how incoherent the generated corpora are, we use a convolutional neural network to try to distinguish the original pairs from the corrupted ones. Results of the classifier as well as a manual inspection of the corpora show that generating such corpora is still a challenge as the generated instances are clearly not “incoherent enough”, indicating that more effort should be spent on developing more robust ways of generating incoherent corpora.

In this paper, we propose a neural-based model to address the first task of the DEFT 2013 shared task, with the main challenge of a highly imbalanced dataset, using state-of-the-art embedding approaches and deep architectures. We report on our experiments on the use of linguistic features, extracted by Charton et. al. (2014), in different neural models utilizing pretrained embeddings. Our results show that all of the models that use linguistic features outperform their counterpart models that only use pretrained embeddings. The best performing model uses pretrained CamemBERT embeddings as input and CNN as the hidden layer, and uses additional linguistic features. Adding the linguistic features to this model improves its performance by 4.5% and 11.4% in terms of micro and macro F1 scores, respectively, leading to state-of-the-art results and an improved classification of the rare classes.

pdf abs
Surface Realization Using Pretrained Language Models
Farhood Farahnak | Laya Rafiee | Leila Kosseim | Thomas Fevens
Proceedings of the Third Workshop on Multilingual Surface Realisation

In the context of Natural Language Generation, surface realization is the task of generating the linear form of a text following a given grammar. Surface realization models usually consist of a cascade of complex sub-modules, either rule-based or neural network-based, each responsible for a specific sub-task. In this work, we show that a single encoder-decoder language model can be used in an end-to-end fashion for all sub-tasks of surface realization. The model is designed based on the BART language model that receives a linear representation of unordered and non-inflected tokens in a sentence along with their corresponding Universal Dependency information and produces the linear sequence of inflected tokens along with the missing words. The model was evaluated on the shallow and deep tracks of the 2020 Surface Realization Shared Task (SR’20) using both human and automatic evaluation. The results indicate that despite its simplicity, our model achieves competitive results among all participants in the shared task.

pdf abs
Du bon usage d’ingrédients linguistiques spéciaux pour classer des recettes exceptionnelles (Using Special Linguistic Ingredients to Classify Exceptional Recipes )
Elham Mohammadi | Louis Marceau | Eric Charton | Leila Kosseim | Luka Nerima | Marie-Jean Meurs
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

Nous présentons un modèle d’apprentissage automatique qui combine modèles neuronaux et linguistiques pour traiter les tâches de classification dans lesquelles la distribution des étiquettes des instances est déséquilibrée. Les performances de ce modèle sont mesurées à l’aide d’expériences menées sur les tâches de classification de recettes de cuisine de la campagne DEFT 2013 (Grouin et al., 2013). Nous montrons que les plongements lexicaux (word embeddings) associés à des méthodes d’apprentissage profond obtiennent de meilleures performances que tous les algorithmes déployés lors de la campagne DEFT. Nous montrons aussi que ces mêmes classifieurs avec plongements lexicaux peuvent gagner en performance lorsqu’un modèle linguistique est ajouté au modèle neuronal. Nous observons que l’ajout d’un modèle linguistique au modèle neuronal améliore les performances de classification sur les classes rares.

2019

pdf abs
CLaC Lab at SemEval-2019 Task 3: Contextual Emotion Detection Using a Combination of Neural Networks and SVM
Elham Mohammadi | Hessam Amini | Leila Kosseim
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system at SemEval 2019, Task 3 (EmoContext), which focused on the contextual detection of emotions in a dataset of 3-round dialogues. For our final system, we used a neural network with pretrained ELMo word embeddings and POS tags as input, GRUs as hidden units, an attention mechanism to capture representations of the dialogues, and an SVM classifier which used the learned network representations to perform the task of multi-class classification. This system yielded a micro-averaged F1 score of 0.7072 for the three emotion classes, improving the baseline by approximately 12%.

pdf abs
Neural Feature Extraction for Contextual Emotion Detection
Elham Mohammadi | Hessam Amini | Leila Kosseim
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper describes a new approach for the task of contextual emotion detection. The approach is based on a neural feature extractor, composed of a recurrent neural network with an attention mechanism, followed by a classifier, that can be neural or SVM-based. We evaluated the model with the dataset of the task 3 of SemEval 2019 (EmoContext), which includes short 3-turn conversations, tagged with 4 emotion classes. The best performing setup was achieved using ELMo word embeddings and POS tags as input, bidirectional GRU as hidden units, and an SVM as the final classifier. This configuration reached 69.93% in terms of micro-average F1 score on the main 3 emotion classes, a score that outperformed the baseline system by 11.25%.

pdf abs
CLaC at CLPsych 2019: Fusion of Neural Features and Predicted Class Probabilities for Suicide Risk Assessment Based on Online Posts
Elham Mohammadi | Hessam Amini | Leila Kosseim
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

This paper summarizes our participation to the CLPsych 2019 shared task, under the name CLaC. The goal of the shared task was to detect and assess suicide risk based on a collection of online posts. For our participation, we used an ensemble method which utilizes 8 neural sub-models to extract neural features and predict class probabilities, which are then used by an SVM classifier. Our team ranked first in 2 out of the 3 tasks (tasks A and C).

pdf abs
The Concordia NLG Surface Realizer at SRST 2019
Farhood Farahnak | Laya Rafiee | Leila Kosseim | Thomas Fevens
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

This paper presents the model we developed for the shallow track of the 2019 NLG Surface Realization Shared Task. The model reconstructs sentences whose word order and word inflections were removed. We divided the problem into two sub-problems: reordering and inflecting. For the purpose of reordering, we used a pointer network integrated with a transformer model as its encoder-decoder modules. In order to generate the inflected forms of tokens, a Feed Forward Neural Network was employed.

2018

pdf
Attention for Implicit Discourse Relation Recognition
Andre Cianflone | Leila Kosseim
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf abs
CLaC @ DEFT 2018: Sentiment analysis of tweets on transport from Île-de-France
Simon Jacques | Farhood Farahnak | Leila Kosseim
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

CLaC @ DEFT 2018: Analysis of tweets on transport on the Île-de-France This paper describes the system deployed by the CLaC lab at Concordia University in Montreal for the DEFT 2018 shared task. The competition consisted in four different tasks; however, due to lack of time, we only participated in the first two. We participated with a system based on conventional supervised learning methods: a support vector machine classifier and an artificial neural network. For task 1, our best approach achieved an F-measure of 87.61%; while at task 2, we achieve 51.03%, situating our system below the average of the other participants.

2017

pdf abs
Automatic Identification of AltLexes using Monolingual Parallel Corpora
Elnaz Davoodi | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as since or but, are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signalled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the Simple Wikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.

pdf abs
Argument Labeling of Explicit Discourse Relations using LSTM Neural Networks
Sohail Hooda | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Argument labeling of explicit discourse relations is a challenging task. The state of the art systems achieve slightly above 55% F-measure but require hand-crafted features. In this paper, we propose a Long Short Term Memory (LSTM) based model for argument labeling. We experimented with multiple configurations of our model. Using the PDTB dataset, our best model achieved an F1 measure of 23.05% without any feature engineering. This is significantly higher than the 20.52% achieved by the state of the art RNN approach, but significantly lower than the feature based state of the art systems. On the other hand, because our approach learns only from the raw dataset, it is more widely applicable to multiple textual genres and languages.

pdf abs
Improving Discourse Relation Projection to Build Discourse Annotated Corpora
Majid Laali | Leila Kosseim
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approach identified 65% of the unsupported annotations in the English-French parallel sentences from Europarl. By filtering out these unsupported annotations, we induced the first PDTB-style discourse annotated corpus for French from Europarl. We then used this corpus to train a classifier to identify the discourse-usage of French discourse connectives and show a 15% improvement of F1-score compared to the classifier trained on the non-filtered annotations.

pdf bib abs
Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations
Majid Laali | Leila Kosseim
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

In this paper, we present an approach to exploit phrase tables generated by statistical machine translation in order to map French discourse connectives to discourse relations. Using this approach, we created DisCoRel, a lexicon of French discourse connectives and their PDTB relations. When evaluated against LEXCONN, DisCoRel achieves a recall of 0.81 and an Average Precision of 0.68 for the Concession and Condition relations.

2016

pdf
CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification
Elnaz Davoodi | Leila Kosseim
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
On the Contribution of Discourse Structure on Text Complexity Assessment
Elnaz Davoodi | Leila Kosseim
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf abs
N-gram and Neural Language Models for Discriminating Similar Languages
Andre Cianflone | Leila Kosseim
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This paper describes our submission to the 2016 Discriminating Similar Languages (DSL) Shared Task. We participated in the closed Sub-task 1 with two separate machine learning techniques. The first approach is a character based Convolution Neural Network with an LSTM layer (CLSTM), which achieved an accuracy of 78.45% with minimal tuning. The second approach is a character-based n-gram model of size 7. It achieved an accuracy of 88.45% which is close to the accuracy of 89.38% achieved by the best submission.

pdf
The CLaC Discourse Parser at CoNLL-2016
Majid Laali | Andre Cianflone | Leila Kosseim
Proceedings of the CoNLL-16 shared task

The availability of huge amounts of online opinions has created a new need to develop effective query-based opinion summarizers to analyze this information in order to facilitate decision making at every level. To develop an effective opinion summarization approach, we have targeted to resolve specifically Question Irrelevancy and Discourse Incoherency problems which have been found to be the most frequently occurring problems for opinion summarization. To address these problems, we have introduced a hybrid approach by combining text schema and rhetorical relations to exploit intra-sentential rhetorical relations. To evaluate our approach, we have built a system called BlogSum and have compared BlogSum-generated summaries after applying rhetorical structuring to BlogSum-generated candidate sentences without utilizing rhetorical relations using the Text Analysis Conference (TAC) 2008 data for summary contents. Evaluation results show that our approach improves summary contents by reducing question irrelevant sentences.

2009

pdf bib
Summarizing Blog Entries versus News Texts
Shamima Mithun | Leila Kosseim
Proceedings of the Workshop on Events in Emerging Text Types

2008

pdf abs
Answering List Questions using Co-occurrence and Clustering
Majid Razmara | Leila Kosseim
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Although answering list questions is not a new research area, answering them automatically still remains a challenge. The median F-score of systems that participated in TREC 2007 Question Answering track is still very low (0.085) while 74% of the questions had a median F-score of 0. In this paper, we propose a novel approach to answering list questions. This approach is based on the hypothesis that answer instances of a list question co-occur in the documents and sentences related to the topic of the question. We use a clustering method to group the candidate answers that co-occur more often. To pinpoint the right cluster, we use the target and the question keywords as spies to return the cluster that contains these keywords.

2004

pdf
The Problem of Precision in Restricted-Domain Question Answering. Some Proposed Methods of Improvement
Hai Doan-Nguyen | Leila Kosseim
Proceedings of the Conference on Question Answering in Restricted Domains

pdf
Simple features for statistical Word Sense Disambiguation
Abolfazl Lamjiri | Osama El Demerdash | Leila Kosseim
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

pdf bib abs
Generation of natural responses through syntactic patterns
Glenda B. Anaya | Leila Kosseim
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

The goal of Question-Answering (QA) systems is to find short and factual answers to opendomain questions by searching a large collection of documents. The subject of this research is to formulate complete and natural answer-sentences to questions, given the short answer. The answer-sentences are meant to be self-sufficient; that is, they should contain enough context to be understood without needing the original question. Generating such sentences is important in question-answering as they can be used to enhance existing QA systems to provide answers to the user in a more natural way and to provide a pattern to actually extract the answer from the document collection.

2001

pdf abs
Critères de sélection d’une approche pour le suivi automatique du courriel
Leila Kosseim | Guy Lapalme
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article discute de différentes approches pour faire le suivi automatique du courrier-électronique. Nous présentons tout d’abord les méthodes de traitement automatique de la langue (TAL) les plus utilisées pour cette tâche, puis un ensemble de critères influençant le choix d’une approche. Ces critères ont été développés grâce à une étude de cas sur un corpus fourni par Bell Canada Entreprises. Avec notre corpus, il est apparu que si aucune méthode n’est complètement satisfaisante par elle-même, une approche combinée semble beaucoup plus prometteuse.

pdf abs
Extraction de noms propres à partir de textes variés: problématique et enjeux
Leila Kosseim | Thierry Poibeau
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article porte sur l’identification de noms propres à partir de textes écrits. Les stratégies à base de règles développées pour des textes de type journalistique se révèlent généralement insuffisantes pour des corpus composés de textes ne répondant pas à des critères rédactionnels stricts. Après une brève revue des travaux effectués sur des corpus de textes de nature journalistique, nous présentons la problématique de l’analyse de textes variés en nous basant sur deux corpus composés de courriers électroniques et de transcriptions manuelles de conversations téléphoniques. Une fois les sources d’erreurs présentées, nous décrivons l’approche utilisée pour adapter un système d’extraction de noms propres développé pour des textes journalistiques à l’analyse de messages électroniques.