Stergos Afantenos

2020

pdf bib abs
CLFD: A Novel Vectorization Technique and Its Application in Fake News Detection
Michail Mersinias | Stergos Afantenos | Georgios Chalkiadakis
Proceedings of the 12th Language Resources and Evaluation Conference

In recent years, fake news detection has been an emerging research area. In this paper, we put forward a novel statistical approach for the generation of feature vectors to describe a document. Our so-called class label frequency distance (clfd), is shown experimentally to provide an effective way for boosting the performance of machine learning methods. Specifically, our experiments, carried out in the fake news detection domain, verify that efficient traditional machine learning methods that use our vectorization approach, consistently outperform deep learning methods that use word embeddings for small and medium sized datasets, while the results are comparable for large datasets. In addition, we demonstrate that a novel hybrid method that utilizes both a clfd-boosted logistic regression classifier and a deep learning one, clearly outperforms deep learning methods even in large datasets.

2016

pdf bib
Integer Linear Programming for Discourse Parsing
Jérémy Perret | Stergos Afantenos | Nicholas Asher | Mathieu Morey
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib abs
Parallel Discourse Annotations on a Corpus of Short Texts
Manfred Stede | Stergos Afantenos | Andreas Peldszus | Nicholas Asher | Jérémy Perret
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the first corpus of texts annotated with two alternative approaches to discourse structure, Rhetorical Structure Theory (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher and Lascarides, 2003). 112 short argumentative texts have been analyzed according to these two theories. Furthermore, in previous work, the same texts have already been annotated for their argumentation structure, according to the scheme of Peldszus and Stede (2013). This corpus therefore enables studies of correlations between the two accounts of discourse structure, and between discourse and argumentation. We converted the three annotation formats to a common dependency tree format that enables to compare the structures, and we describe some initial findings.

pdf bib abs
Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus
Nicholas Asher | Julie Hunter | Mathieu Morey | Benamara Farah | Stergos Afantenos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the STAC resource, a corpus of multi-party chats annotated for discourse structure in the style of SDRT (Asher and Lascarides, 2003; Lascarides and Asher, 2009). The main goal of the STAC project is to study the discourse structure of multi-party dialogues in order to understand the linguistic strategies adopted by interlocutors to achieve their conversational goals, especially when these goals are opposed. The STAC corpus is not only a rich source of data on strategic conversation, but also the first corpus that we are aware of that provides full discourse structures for multi-party dialogues. It has other remarkable features that make it an interesting resource for other topics: interleaved threads, creative language, and interactions between linguistic and extra-linguistic contexts.

This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.

2010

pdf bib abs
Learning Recursive Segments for Discourse Parsing
Stergos Afantenos | Pascal Denis | Philippe Muller | Laurence Danlos
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse, like the ""Segmented Discourse Representation Theory"" or SDRT, allow for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques making use of a regularized maximum entropy model, combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.

pdf bib
Testing SDRT’s Right Frontier
Stergos Afantenos | Nicholas Asher
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib abs
Nouvelles considérations pour la détection de réutilisation de texte
Fabien Poulard | Stergos Afantenos | Nicolas Hernandez
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article nous nous intéressons au problème de la détection de réutilisation de texte. Plus particulièrement, étant donné un document original et un ensemble de documents candidats — thématiquement similaires au premier — nous cherchons à classer ceux qui sont dérivés du document original et ceux qui ne le sont pas. Nous abordons le problème selon deux approches : dans la première, nous nous intéressons aux similarités discursives entre les documents, dans la seconde au recouvrement de n-grams hapax. Nous présentons le résultat d’expérimentations menées sur un corpus de presse francophone construit dans le cadre du projet ANR PIITHIE.

pdf bib abs
Apache UIMA pour le Traitement Automatique des Langues
Nicolas Hernandez | Fabien Poulard | Stergos Afantenos | Matthieu Vernier | Jérôme Rocheteau
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

L’objectif de la démonstration est d’une part de faire un retour d’expérience sur la solution logicielle Apache UIMA comme infrastructure de développement d’applications distribuées de TAL, et d’autre part de présenter les développements réalisés par l’équipe TALN du LINA pour permettre à la communauté de s’approprier ce « framework ».

pdf bib
What’s in a Message?
Stergos Afantenos | Nicolas Hernandez
Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition

Venues

NAACL1

EMNLP1