2022
pdf
abs
National Language Technology Platform for Public Administration
Marko Tadić
|
Daša Farkaš
|
Matea Filko
|
Artūrs Vasiļevskis
|
Andrejs Vasiļjevs
|
Jānis Ziediņš
|
Željka Motika
|
Mark Fishel
|
Hrafn Loftsson
|
Jón Guðnason
|
Claudia Borg
|
Keith Cortis
|
Judie Attard
|
Donatienne Spiteri
Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference
This article presents the work in progress on the collaborative project of several European countries to develop National Language Technology Platform (NLTP). The project aims at combining the most advanced Language Technology tools and solutions in a new, state-of-the-art, Artificial Intelligence driven, National Language Technology Platform for five EU/EEA official and lower-resourced languages.
pdf
abs
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts
Kanishk Verma
|
Tijana Milosevic
|
Keith Cortis
|
Brian Davis
Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference
Cyberbullying is bullying perpetrated via the medium of modern communication technologies like social media networks and gaming platforms. Unfortunately, most existing datasets focusing on cyberbullying detection or classification are i) limited in number ii) usually targeted to one specific online social networking (OSN) platform, or iii) often contain low-quality annotations. In this study, we fine-tune and benchmark state of the art neural transformers for the binary classification of cyberbullying in social media texts, which is of high value to Natural Language Processing (NLP) researchers and computational social scientists. Furthermore, this work represents the first step toward building neural language models for cross OSN platform cyberbullying classification to make them as OSN platform agnostic as possible.
pdf
abs
Baseline English and Maltese-English Classification Models for Subjectivity Detection, Sentiment Analysis, Emotion Analysis, Sarcasm Detection, and Irony Detection
Keith Cortis
|
Brian Davis
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
This paper presents baseline classification models for subjectivity detection, sentiment analysis, emotion analysis, sarcasm detection, and irony detection. All models are trained on user-generated content gathered from newswires and social networking services, in three different languages: English —a high-resourced language, Maltese —a low-resourced language, and Maltese-English —a code-switched language. Traditional supervised algorithms namely, Support Vector Machines, Naïve Bayes, Logistic Regression, Decision Trees, and Random Forest, are used to build a baseline for each classification task, namely subjectivity, sentiment polarity, emotion, sarcasm, and irony. Baseline models are established at a monolingual (English) level and at a code-switched level (Maltese-English). Results obtained from all the classification models are presented.
pdf
abs
National Language Technology Platform (NLTP): overall view
Artūrs Vasiļevskis
|
Jānis Ziediņš
|
Marko Tadić
|
Željka Motika
|
Mark Fishel
|
Hrafn Loftsson
|
Jón Gu
|
Claudia Borg
|
Keith Cortis
|
Judie Attard
|
Donatienne Spiteri
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
The work in progress on the CEF Action National Language Technology Platform (NLTP) is presented. The Action aims at combining the most advanced Language Technology (LT) tools and solutions in a new state-of-the-art, Artificial Intelli- gence (AI) driven, National Language Technology Platform (NLTP).
2021
pdf
abs
Fine-tuning Neural Language Models for Multidimensional Opinion Mining of English-Maltese Social Data
Keith Cortis
|
Kanishk Verma
|
Brian Davis
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
This paper presents multidimensional Social Opinion Mining on user-generated content gathered from newswires and social networking services in three different languages: English —a high-resourced language, Maltese —a low-resourced language, and Maltese-English —a code-switched language. Multiple fine-tuned neural classification language models which cater for the i) English, Maltese and Maltese-English languages as well as ii) five different social opinion dimensions, namely subjectivity, sentiment polarity, emotion, irony and sarcasm, are presented. Results per classification model for each social opinion dimension are discussed.
pdf
abs
Malta National Language Technology Platform: A vision for enhancing Malta’s official languages using Machine Translation
Keith Cortis
|
Judie Attard
|
Donatienne Spiteri
Proceedings of the First Workshop on Multimodal Machine Translation for Low Resource Languages (MMTLRL 2021)
In this paper we introduce a vision towards establishing the Malta National Language Technology Platform; an ongoing effort that aims to provide a basis for enhancing Malta’s official languages, namely Maltese and English, using Machine Translation. This will contribute towards the current niche of Language Technology support for the Maltese low-resource language, across multiple computational linguistics fields, such as speech processing, machine translation, text analysis, and multi-modal resources. The end goals are to remove language barriers, increase accessibility, foster cross-border services, and most importantly to facilitate the preservation of the Maltese language.
2019
pdf
abs
A Social Opinion Gold Standard for the Malta Government Budget 2018
Keith Cortis
|
Brian Davis
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
We present a gold standard of annotated social opinion for the Malta Government Budget 2018. It consists of over 500 online posts in English and/or the Maltese less-resourced language, gathered from social media platforms, specifically, social networking services and newswires, which have been annotated with information about opinions expressed by the general public and other entities, in terms of sentiment polarity, emotion, sarcasm/irony, and negation. This dataset is a resource for opinion mining based on social data, within the context of politics. It is the first opinion annotated social dataset from Malta, which has very limited language resources available.
2017
pdf
abs
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Keith Cortis
|
André Freitas
|
Tobias Daudert
|
Manuela Huerlimann
|
Manel Zarrouk
|
Siegfried Handschuh
|
Brian Davis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
This paper discusses the “Fine-Grained Sentiment Analysis on Financial Microblogs and News” task as part of SemEval-2017, specifically under the “Detecting sentiment, humour, and truth” theme. This task contains two tracks, where the first one concerns Microblog messages and the second one covers News Statements and Headlines. The main goal behind both tracks was to predict the sentiment score for each of the mentioned companies/stocks. The sentiment scores for each text instance adopted floating point values in the range of -1 (very negative/bearish) to 1 (very positive/bullish), with 0 designating neutral sentiment. This task attracted a total of 32 participants, with 25 participating in Track 1 and 29 in Track 2.
2014
pdf
What or Who is Multilingual Watson?
Keith Cortis
|
Urvesh Bhowan
|
Ronan Mac an tSaoir
|
D.J. McCloskey
|
Mikhail Sogrin
|
Ross Cadogan
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations