Lamia Hadrich Belguith

Also published as: Lamia Belguith, Lamia Belguith Hadrich, Lamia Hadrich Belguith, Lamia Hadrich belguith, Lamia Hadrich-Belguith, Lamia-Hadrich Belguith

2024

pdf abs
Lexicons Gain the Upper Hand in Arabic MWE Identification
Najet Hadj Mohamed | Agata Savary | Cherifa Ben Khelil | Jean-Yves Antoine | Iskandar Keskes | Lamia Hadrich-Belguith
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024

This paper highlights the importance of integrating MWE identification with the development of syntactic MWE lexicons. It suggests that lexicons with minimal morphosyntactic information can amplify current MWE-annotated datasets and refine identification strategies. To our knowledge, this work represents the first attempt to focus on both seen and unseen of VMWEs for Arabic. It also deals with the challenge of differentiating between literal and figurative interpretations of idiomatic expressions. The approach involves a dual-phase procedure: first projecting a VMWE lexicon onto a corpus to identify candidate occurrences, then disambiguating these occurrences to distinguish idiomatic from literal instances. Experiments outlined in the paper aim to assess the efficacy of this technique, utilizing a lexicon known as LEXAR and the “parseme-ar” corpus. The findings suggest that lexicon-driven strategies have the potential to refine MWE identification, particularly for unseen occurrences.

2022

pdf abs
Annotation d’expressions polylexicales verbales en arabe : validation d’une procédure d’annotation multilingue (Annotating Verbal Multiword Expressions in Arabic : Assessing the Validity of a Multilingual)
Najet Hadj Mohamed | Cherifa Ben Khelil | Agata Savary | Iskander Keskes | Jean Yves Antoine | Lamia Hadrich Belguith
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Cet article décrit nos efforts pour étendre le projet PARSEME à l’arabe standard moderne. L’applicabilité du guide d’annotation de PARSEME a été testée en mesurant l’accord inter-annotateurs dès la première phase d’annotation. Un sous-ensemble de 1062 phrases du Prague Arabic Dependency Treebank (PADT) a été sélectionné et annoté indépendamment par deux locutrices natives arabes. Suite à leurs annotations, un nouveau corpus arabe avec plus de 1250 expressions polylexicales verbales (EPV) annotées a été construit.

pdf abs
Benchmarking transfer learning approaches for sentiment analysis of Arabic dialect
Emna Fsih | Sameh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Arabic has a widely varying collection of dialects. With the explosion of the use of social networks, the volume of written texts has remarkably increased. Most users express themselves using their own dialect. Unfortunately, many of these dialects remain under-studied due to the scarcity of resources. Researchers and industry practitioners are increasingly interested in analyzing users’ sentiments. In this context, several approaches have been proposed, namely: traditional machine learning, deep learning transfer learning and more recently few-shot learning approaches. In this work, we compare their efficiency as part of the NADI competition to develop a country-level sentiment analysis model. Three models were beneficial for this sub-task: The first based on Sentence Transformer (ST) and achieve 43.23% on DEV set and 42.33% on TEST set, the second based on CAMeLBERT and achieve 47.85% on DEV set and 41.72% on TEST set and the third based on multi-dialect BERT model and achieve 66.72% on DEV set and 39.69% on TEST set.

pdf
A deep sentiment analysis of Tunisian dialect comments on multi-domain posts in different social media platforms
Emna Fsih | Rahma Boujelbane | Lamia Hadrich Belguith
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

pdf abs
Annotating Verbal Multiword Expressions in Arabic: Assessing the Validity of a Multilingual Annotation Procedure
Najet Hadj Mohamed | Cherifa Ben Khelil | Agata Savary | Iskandar Keskes | Jean-Yves Antoine | Lamia Hadrich-Belguith
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes our efforts to extend the PARSEME framework to Modern Standard Arabic. Theapplicability of the PARSEME guidelines was tested by measuring the inter-annotator agreement in theearly annotation stage. A subset of 1,062 sentences from the Prague Arabic Dependency Treebank PADTwas selected and annotated by two Arabic native speakers independently. Following their annotations, anew Arabic corpus with over 1,250 annotated VMWEs has been built. This corpus already exceeds thesmallest corpora of the PARSEME suite, and enables first observations. We discuss our annotation guide-line schema that shows full MWE annotation is realizable in Arabic where we get good inter-annotator agreement.

pdf abs
Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect
Saméh Kchaou | Rahma Boujelbane | Emna Fsih | Lamia Hadrich-Belguith
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the growing access to the internet, the spoken Arabic dialect language becomes informal languages written in social media. Most users post comments using their own dialect. This linguistic situation inhibits mutual understanding between internet users and makes difficult to use computational approaches since most Arabic resources are intended for the formal language: Modern Standard Arabic (MSA). In this paper, we present a pipeline to standardize the written texts in social networks by translating them to the standard language MSA. We fine-tun at first an identification bert-based model to select Tunisian Dialect (TD) from MSA and other dialects. Then, we learned transformer model to translate TD to MSA. The final system includes the translated TD text and the originally text written in MSA. Each of these steps was evaluated on the same test corpus. In order to test the effectiveness of the approach, we compared two opinion analysis models, the first intended for the Sentiment Analysis (SA) of dialect texts and the second for the MSA texts. We concluded that through standardization we obtain the best score.

2020

pdf abs
Parallel resources for Tunisian Arabic Dialect Translation
Saméh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Fifth Arabic Natural Language Processing Workshop

The difficulty of processing dialects is clearly observed in the high cost of building representative corpus, in particular for machine translation. Indeed, all machine translation systems require a huge amount and good management of training data, which represents a challenge in a low-resource setting such as the Tunisian Arabic dialect. In this paper, we present a data augmentation technique to create a parallel corpus for Tunisian Arabic dialect written in social media and standard Arabic in order to build a Machine Translation (MT) model. The created corpus was used to build a sentence-based translation model. This model reached a BLEU score of 15.03% on a test set, while it was limited to 13.27% utilizing the corpus without augmentation.

pdf abs
Toward Qualitative Evaluation of Embeddings for Arabic Sentiment Analysis
Amira Barhoumi | Nathalie Camelin | Chafik Aloulou | Yannick Estève | Lamia Hadrich Belguith
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we propose several protocols to evaluate specific embeddings for Arabic sentiment analysis (SA) task. In fact, Arabic language is characterized by its agglutination and morphological richness contributing to great sparsity that could affect embedding quality. This work presents a study that compares embeddings based on words and lemmas in SA frame. We propose first to study the evolution of embedding models trained with different types of corpora (polar and non polar) and explore the variation between embeddings by observing the sentiment stability of neighbors in embedding spaces. Then, we evaluate embeddings with a neural architecture based on convolutional neural network (CNN). We make available our pre-trained embeddings to Arabic NLP research community with free to use. We provide also for free resources used to evaluate our embeddings. Experiments are done on the Large Arabic-Book Reviews (LABR) corpus in binary (positive/negative) classification frame. Our best result reaches 91.9%, that is higher than the best previous published one (91.5%).

2019

pdf abs
LIUM-MIRACL Participation in the MADAR Arabic Dialect Identification Shared Task
Saméh Kchaou | Fethi Bougares | Lamia Hadrich-Belguith
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper describes the joint participation of the LIUM and MIRACL Laboratories at the Arabic dialect identification challenge of the MADAR Shared Task (Bouamor et al., 2019) conducted during the Fourth Arabic Natural Language Processing Workshop (WANLP 2019). We participated to the Travel Domain Dialect Identification subtask. We built several systems and explored different techniques including conventional machine learning methods and deep learning algorithms. Deep learning approaches did not perform well on this task. We experimented several classification systems and we were able to identify the dialect of an input sentence with an F1-score of 65.41% on the official test set using only the training data supplied by the shared task organizers.

pdf abs
Plongements lexicaux spécifiques à la langue arabe : application à l’analyse d’opinions (Arabic-specific embedddings : application in Sentiment Analysis)
Amira Barhoumi | Nathalie Camelin | Chafik Aloulou | Yannick Estève | Lamia Hadrich Belguith
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Nous nous intéressons, dans cet article, à la tâche d’analyse d’opinions en arabe. Nous étudions la spécificité de la langue arabe pour la détection de polarité. Nous nous focalisons ici sur les caractéristiques d’agglutination et de richesse morphologique de cette langue. Nous avons particulièrement étudié différentes représentations d’unité lexicale : token, lemme et light stemme. Nous avons construit et testé des espaces continus de ces différentes représentations lexicales. Nous avons mesuré l’apport de tels types de representations vectorielles dans notre cadre spécifique. Les performances du réseau CNN montrent un gain significatif de 2% par rapport à l’état de l’art.

pdf abs
Semantic Language Model for Tunisian Dialect
Abir Masmoudi | Rim Laatar | Mariem Ellouze | Lamia Hadrich Belguith
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper, we describe the process of creating a statistical Language Model (LM) for the Tunisian Dialect. Indeed, this work is part of the realization of Automatic Speech Recognition (ASR) system for the Tunisian Railway Transport Network. Since our eld of work has been limited, there are several words with similar behaviors (semantic for example) but they do not have the same appearance probability; their class groupings will therefore be possible. For these reasons, we propose to build an n-class LM that is based mainly on the integration of purely semantic data. Indeed, each class represents an abstraction of similar labels. In order to improve the sequence labeling task, we proposed to use a discriminative algorithm based on the Conditional Random Field (CRF) model. To better judge our choice of creating an n-class word model, we compared the created model with the 3-gram type model on the same test corpus of evaluation. Additionally, to assess the impact of using the CRF model to perform the semantic labelling task in order to construct semantic classes, we compared the n-class created model with using the CRF in the semantic labelling task and the n- class model without using the CRF in the semantic labelling task. The drawn comparison of the predictive power of the n-class model obtained by applying the CRF model in the semantic labelling is that it is better than the other two models presenting the highest value of its perplexity.

pdf abs
Automatic diacritization of Tunisian dialect text using Recurrent Neural Network
Abir Masmoudi | Mariem Ellouze | Lamia Hadrich belguith
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The absence of diacritical marks in the Arabic texts generally leads to morphological, syntactic and semantic ambiguities. This can be more blatant when one deals with under-resourced languages, such as the Tunisian dialect, which suffers from unavailability of basic tools and linguistic resources, like sufficient amount of corpora, multilingual dictionaries, morphological and syntactic analyzers. Thus, this language processing faces greater challenges due to the lack of these resources. The automatic diacritization of MSA text is one of the various complex problems that can be solved by deep neural networks today. Since the Tunisian dialect is an under-resourced language of MSA and as there are a lot of resemblance between both languages, we suggest to investigate a recurrent neural network (RNN) for this dialect diacritization problem. This model will be compared to our previous models models CRF and SMT (CITATION) based on the same dialect corpus. We can experimentally show that our model can achieve better outcomes (DER of 10.72%), as compared to the two models CRF (DER of 20.25%) and SMT (DER of 33.15%).

2018

pdf
Détection des couples de termes translittérés à partir d’un corpus parallèle anglais-arabe ()
Wafa Neifar | Thierry Hamon | Pierre Zweigenbaum | Mariem Ellouze | Lamia-Hadrich Belguith
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

2017

pdf abs
Machine Learning Approach to Evaluate MultiLingual Summaries
Samira Ellouze | Maher Jaoua | Lamia Hadrich Belguith
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

The present paper introduces a new MultiLing text summary evaluation method. This method relies on machine learning approach which operates by combining multiple features to build models that predict the human score (overall responsiveness) of a new summary. We have tried several single and “ensemble learning” classifiers to build the best model. We have experimented our method in summary level evaluation where we evaluate each text summary separately. The correlation between built models and human score is better than the correlation between baselines and manual score.

pdf abs
Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments
Salima Medhaffar | Fethi Bougares | Yannick Estève | Lamia Hadrich-Belguith
Proceedings of the Third Arabic Natural Language Processing Workshop

Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we are interested in the SA of the Tunisian Dialect. We utilize Machine Learning techniques to determine the polarity of comments written in Tunisian Dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian Dialect corpus of 17.000 comments from Facebook. This corpus allows us a significant accuracy improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas.

2016

pdf abs
Impact de l’agglutination dans l’extraction de termes en arabe standard moderne (Adaptation of a term extractor to the Modern Standard Arabic language)
Wafa Neifar | Thierry Hamon | Pierre Zweigenbaum | Mariem Ellouze | Lamia Hadrich Belguith
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

Nous présentons, dans cet article, une adaptation à l’arabe standard moderne d’un extracteur de termes pour le français et l’anglais. L’adaptation a d’abord consisté à décrire le processus d’extraction des termes de manière similaire à celui défini pour l’anglais et le français en prenant en compte certains particularités morpho-syntaxiques de la langue arabe. Puis, nous avons considéré le phénomène de l’agglutination de la langue arabe. L’évaluation a été réalisée sur un corpus de textes médicaux. Les résultats montrent que parmi 400 termes candidats maximaux analysés, 288 sont jugés corrects par rapport au domaine (72,1%). Les erreurs d’extraction sont dues à l’étiquetage morpho-syntaxique et à la non-voyellation des textes mais aussi à des phénomènes d’agglutination.

2015

pdf abs
Détection automatique de l’ironie dans les tweets en français
Jihen Karoui | Farah Benamara Zitoune | Véronique Moriceau | Nathalie Aussenac-Gilles | Lamia Hadrich Belguith
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente une méthode par apprentissage supervisé pour la détection de l’ironie dans les tweets en français. Un classifieur binaire utilise des traits de l’état de l’art dont les performances sont reconnues, ainsi que de nouveaux traits issus de notre étude de corpus. En particulier, nous nous sommes intéressés à la négation et aux oppositions explicites/implicites entre des expressions d’opinion ayant des polarités différentes. Les résultats obtenus sont encourageants.

pdf
Towards a Contextual Pragmatic Model to Detect Irony in Tweets
Jihen Karoui | Farah Benamara Zitoune | Véronique Moriceau | Nathalie Aussenac-Gilles | Lamia Hadrich Belguith
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf
Sentiment Classification of Arabic Documents: Experiments with multi-type features and ensemble algorithms
Amine Bayoudhi | Hatem Ghorbel | Lamia Hadrich Belguith
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia. Tunisian Arabic is an under-resourced language. It has neither a standard orthography nor large collections of written text and dictionaries. Actually, there is no strict separation between Modern Standard Arabic, the official language of the government, media and education, and Tunisian Arabic; the two exist on a continuum dominated by mixed forms. In this paper, we present a conventional orthography for Tunisian Arabic, following a previous effort on developing a conventional orthography for Dialectal Arabic (or CODA) demonstrated for Egyptian Arabic. We explain the design principles of CODA and provide a detailed description of its guidelines as applied to Tunisian Arabic.

pdf abs
A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
Abir Masmoudi | Mariem Ellouze Khmekhem | Yannick Estève | Lamia Hadrich Belguith | Nizar Habash
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we describe an effort to create a corpus and phonetic dictionary for Tunisian Arabic Automatic Speech Recognition (ASR). The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. The phonetic (or pronunciation) dictionary is an important ASR component that serves as an intermediary between acoustic models and language models in ASR systems. The method proposed in this paper, to automatically generate a phonetic dictionary, is rule based. For that reason, we define a set of pronunciation rules and a lexicon of exceptions. To determine the performance of our phonetic rules, we chose to evaluate our pronunciation dictionary on two types of corpora. The word error rate of word grapheme-to-phoneme mapping is around 9%.

pdf
De l’arabe standard vers l’arabe dialectal : projection de corpus et ressources linguistiques en vue du traitement automatique de l’oral dans les médias tunisiens [From Modern Standard Arabic to Tunisian dialect: corpus projection and linguistic resources towards the automatic processing of speech in the Tunisian media]
Rahma Boujelbane | Mariem Ellouze | Frédéric Béchet | Lamia Belguith
Traitement Automatique des Langues, Volume 55, Numéro 2 : Traitement automatique du langage parlé [Spoken language processing]

2013

pdf
Exploiting multiple resources for Japanese to English patent translation
Rahma Sellami | Fatiha Sadat | Lamia Hadrich Belguith
Proceedings of the 5th Workshop on Patent Translation

pdf
Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model
Rahma Boujelbane | Mariem Ellouze khemekhem | Siwar BenAyed | Lamia Hadrich Belguith
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf
Mapping Rules for Building a Tunisian Dialect Lexicon and Generating Corpora
Rahma Boujelbane | Mariem Ellouze Khemekhem | Lamia Hadrich Belguith
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Morphological Analysis of Tunisian Dialect
Inès Zribi | Mariem Ellouze Khemakhem | Lamia Hadrich Belguith
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Segmenting Arabic Texts into Elementary Discourse Units (Segmentation de textes arabes en unités discursives minimales) [in French]
Iskandar Keskes | Farah Beanamara | Lamia Hadrich Belguith
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf
An evaluation summary method based on combination of automatic and textual complexity metrics (Une méthode d’évaluation des résumés basée sur la combinaison de métriques automatiques et de complexité textuelle) [in French]
Samira Walha Ellouze | Maher Jaoua | Lamia Hadrich Belguith
Proceedings of TALN 2013 (Volume 2: Short Papers)

pdf
An Evaluation Summary Method Based on a Combination of Content and Linguistic Metrics
Samira Ellouze | Maher Jaoua | Lamia Hadrich Belguith
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf abs
Exploiting Wikipedia as a Knowledge Base for the Extraction of Linguistic Resources: Application on Arabic-French Comparable Corpora and Bilingual Lexicons
Rahma Sellami | Fatiha Sadat | Lamia Hadrich Belguith
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages

We present simple and effective methods for extracting comparable corpora and bilingual lexicons from Wikipedia. We shall exploit the large scale and the structure of Wikipedia articles to extract two resources that will be very useful for natural language applications. We build a comparable corpus from Wikipedia using categories as topic restrictions and we extract bilingual lexicons from inter-language links aligned with statistical method or a combined statistical and linguistic method.

pdf
Extraction de lexiques bilingues à partir de Wikipédia (Bilingual lexicon extraction from Wikipedia) [in French]
Rahma Sellami | Fatiha Sadat | Lamia Hadrich Belguith
JEP-TALN-RECITAL 2012, Workshop TALAf 2012: Traitement Automatique des Langues Africaines (TALAf 2012: African Language Processing)

pdf abs
Clause-based Discourse Segmentation of Arabic Texts
Iskandar Keskes | Farah Benamara | Lamia Hadrich Belguith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.

pdf
Étude comparative entre trois approches de résumé automatique de documents arabes (Comparative Study of Three Approaches to Automatic Summarization of Arabic Documents) [in French]
Iskandar Keskes | Mohamed Mahdi Boudabous | Mohamed Hédi Maaloul | Lamia Hadrich Belguith
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf
La reconnaissance automatique de la fonction des pronoms démonstratifs en langue arabe (Automatic recognition of demonstrative pronouns function in Arabic) [in French]
Yacine Ben Yahia | Souha Mezghani Hammami | Lamia Hadrich Belguith
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

2010

pdf abs
Traitement des disfluences dans le cadre de la compréhension automatique de l’oral arabe spontané
Younès Bahou | Abir Masmoudi | Lamia Hadrich Belguith
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Les disfluences inhérents de toute parole spontanée sont un vrai défi pour les systèmes de compréhension de la parole. Ainsi, nous proposons dans cet article, une méthode originale pour le traitement des disfluences (plus précisément, les autocorrections, les répétitions, les hésitations et les amorces) dans le cadre de la compréhension automatique de l’oral arabe spontané. Notre méthode est basée sur une analyse à la fois robuste et partielle, des énoncés oraux arabes. L’idée consiste à combiner une technique de reconnaissance de patrons avec une analyse sémantique superficielle par segments conceptuels. Cette méthode a été testée à travers le module de compréhension du système SARF, un serveur vocal interactif offrant des renseignements sur le transport ferroviaire tunisien (Bahou et al., 2008). Les résultats d’évaluation de ce module montrent que la méthode proposée est très prometteuse. En effet, les mesures de rappel, de précision et de F-Measure sont respectivement de 79.23%, 74.09% et 76.57%.

pdf abs
L’apport d’une approche hybride pour la reconnaissance des entités nommées en langue arabe
Inès Zribi | Souha Mezghani Hammami | Lamia Hadrich Belguith
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous proposons une méthode hybride pour la reconnaissance des entités nommées pour la langue arabe. Cette méthode profite, d’une part, des avantages de l’utilisation d’une méthode d’apprentissage pour extraire des règles permettant l’identification et la classification des entités nommées. D’autre part, elle repose sur un ensemble de règles extraites manuellement pour corriger et améliorer le résultat de la méthode d’apprentissage. Les résultats de l’évaluation de la méthode proposée sont encourageants. Nous avons obtenu un taux global de F-mesure égal à 79.24%.

2009

pdf bib
Disfluency and Out-of-vocabulary Word Processing in Arabic Speech Understanding
Younès Bahou | Lamia Hadrich Belguith | Abdelmajid Ben Hamadou
Proceedings of the Third Workshop on Computational Approaches to Arabic-Script-based Languages (CAASL3)

pdf abs
Gestion de dialogue oral Homme-machine en arabe
Younès Bahou | Amine Bayoudhi | Lamia Hadrich Belguith
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans le présent papier, nous présentons nos travaux sur la gestion du dialogue oral arabe Homme-machine. Ces travaux entrent dans le cadre de la réalisation du serveur vocal interactif SARF (Bahou et al., 2008) offrant des renseignements sur le transport ferroviaire tunisien en langue arabe standard moderne. Le gestionnaire de dialogue que nous proposons est basé sur une approche structurelle et est composé de deux modèles à savoir, le modèle de tâche et le modèle de dialogue. Le premier modèle permet de i) compléter et vérifier l’incohérence des structures sémantiques représentant les sens utiles des énoncés, ii) générer une requête vers l’application et iii) récupérer le résultat et de formuler une réponse à l’utilisateur en langage naturel. Quant au modèle de dialogue, il assure l’avancement du dialogue avec l’utilisateur et l’identification de ses intentions. L’interaction entre ces deux modèles est assurée grâce à un contexte du dialogue permettant le suivi et la mise à jour de l’historique du dialogue.

2008

pdf abs
Intégration d’une étape de pré-filtrage et d’une fonction multiobjectif en vue d’améliorer le système ExtraNews de résumé de documents multiples
Fatma Kallel Jaoua | Lamia Hadrich Belguith | Maher Jaoua | Abdelmajid Ben Hamadou
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous présentons les améliorations que nous avons apportées au système ExtraNews de résumé automatique de documents multiples. Ce système se base sur l’utilisation d’un algorithme génétique qui permet de combiner les phrases des documents sources pour former les extraits, qui seront croisés et mutés pour générer de nouveaux extraits. La multiplicité des critères de sélection d’extraits nous a inspiré une première amélioration qui consiste à utiliser une technique d’optimisation multi-objectif en vue d’évaluer ces extraits. La deuxième amélioration consiste à intégrer une étape de pré-filtrage de phrases qui a pour objectif la réduction du nombre des phrases des textes sources en entrée. Une évaluation des améliorations apportées à notre système est réalisée sur les corpus de DUC’04 et DUC’07.

2006

pdf abs
Analyse et désambiguïsation morphologiques de textes arabes non voyellés
Lamia Hadrich Belguith | Nouha Chaâben
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Dans ce papier nous proposons d’abord une méthode d’analyse et de désambiguïsation morphologiques de textes arabes non voyellés permettant de lever l’ambiguïté morphologique due à l’absence des marques de voyelles et aussi à l’irrégularité des formes dérivées de certains mots arabes (e.g. formes irrégulières du pluriel des noms et des adjectifs). Ensuite, nous présentons le système MORPH2, un analyseur morphologique de textes arabes non voyellés basé sur la méthode proposée. Ce système est évalué sur un livre scolaire et des articles de journaux. Les résultats obtenus son et très encourageants. En effet, les mesures de rappel et de précision globales sont respectivement de 69,77 % et 68,51 %.

2005

pdf abs
Segmentation de textes arabes basée sur l’analyse contextuelle des signes de ponctuations et de certaines particules
Lamia Hadrich Belguith | Leila Baccour | Mourad Ghassan
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous proposons dans cet article une approche de segmentation de textes arabes non voyellés basée sur une analyse contextuelle des signes de ponctuations et de certaines particules, tels que les conjonctions de coordination. Nous présentons ensuite notre système STAr, un segmenteur de textes arabes basé sur l’approche proposée. STAr accepte en entrée un texte arabe en format txt et génère en sortie un texte segmenté en paragraphes et en phrases.

2003

pdf bib abs
Implémentation du système MASPAR selon une approche multi-agent
Chafik Aloulou | Lamia Hadrich Belguith | Ahmed Hadj Kacem | Souha Hammami Mezghani
Proceedings of the Eighth International Conference on Parsing Technologies

Le traitement automatique du langage naturel est un axe de recherche qui connaît chaque jour de nouvelles théories et approches. Les systèmes d’analyse automatique qui sont fondés sur une approche séquentielle présentent plusieurs inconvénients. Afin de pallier ces limites, nous nous sommes intéressés à la réalisation d’un système d’analyse syntaxique de textes arabes basé sur l’approche multi-agent : MASPAR « Multi-Agent System for Parsing ARabic ».