2018
pdf
bib
abs
Thumbs Up and Down: Sentiment Analysis of Medical Online Forums
Victoria Bobicev
|
Marina Sokolova
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
In the current study, we apply multi-class and multi-label sentence classification to sentiment analysis of online medical forums. We aim to identify major health issues discussed in online social media and the types of sentiments those issues evoke. We use ontology of personal health information for Information Extraction and apply Machine Learning methods in automated recognition of the expressed sentiments.
pdf
bib
abs
Using PPM for Health Related Text Detection
Victoria Bobicev
|
Victoria Lazu
|
Daniela Istrati
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
This paper describes the participation of the LILU team in SMM4H challenge on social media mining for health related events description such as drug intakes or vaccinations.
2017
pdf
bib
abs
Inter-Annotator Agreement in Sentiment Analysis: Machine Learning Perspective
Victoria Bobicev
|
Marina Sokolova
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Manual text annotation is an essential part of Big Text analytics. Although annotators work with limited parts of data sets, their results are extrapolated by automated text classification and affect the final classification results. Reliability of annotations and adequacy of assigned labels are especially important in the case of sentiment annotations. In the current study we examine inter-annotator agreement in multi-class, multi-label sentiment annotation of messages. We used several annotation agreement measures, as well as statistical analysis and Machine Learning to assess the resulting annotations.
pdf
bib
abs
Good News vs. Bad News: What are they talking about?
Olga Kanishcheva
|
Victoria Bobicev
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Today’s massive news streams demand the automate analysis which is provided by various online news explorers. However, most of them do not provide sentiment analysis. The main problem of sentiment analysis of news is the differences between the writers and readers attitudes to the news text. News can be good or bad but have to be delivered in neutral words as pure facts. Although there are applications for sentiment analysis of news, the task of news analysis is still a very actual problem because the latest news impacts people’s lives daily. In this paper, we explored the problem of sentiment analysis for Ukrainian and Russian news, developed a corpus of Ukrainian and Russian news and annotated each text using one of three categories: positive, negative and neutral. Each text was marked by at least three independent annotators via the web interface, the inter-annotator agreement was analyzed and the final label for each text was computed. These texts were used in the machine learning experiments. Further, we investigated what kinds of named entities such as Locations, Organizations, Persons are perceived as good or bad by the readers and which of them were the cause for text annotation ambiguity.
pdf
bib
Syntactic Semantic Correspondence in Dependency Grammar
Cătălina Mărănduc
|
Cătălin Mititelu
|
Victoria Bobicev
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories
pdf
bib
abs
Tools for Building a Corpus to Study the Historical and Geographical Variation of the Romanian Language
Victoria Bobicev
|
Cătălina Mărănduc
|
Cenel Augusto Perez
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe
Contemporary standard language corpora are ideal for NLP. There are few morphologically and syntactically annotated corpora for Romanian, and those existing or in progress only deal with the Contemporary Romanian standard. However, the necessity to study the dynamics of natural languages gave rise to balanced corpora, containing non-standard texts. In this paper, we describe the creation of tools for processing non-standard Romanian to build a big balanced corpus. We want to preserve in annotated form as many early stages of language as possible. We have already built a corpus in Old Romanian. We also intend to include the South-Danube dialects, remote to the standard language, along with regional forms closer to the standard. We try to preserve data about endangered idioms such as Aromanian, Meglenoromanian and Istroromanian dialects, and calculate the distance between different regional variants, including the language spoken in the Republic of Moldova. This distance, as well as the mutual understanding between the speakers, is the correct criterion for the classification of idioms as different languages, or as dialects, or as regional variants close to the standard.
2016
pdf
bib
abs
Automatic Detection of Arabicized Berber and Arabic Varieties
Wafia Adouane
|
Nasredine Semmar
|
Richard Johansson
|
Victoria Bobicev
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine. It is the first necessary step to do any language-dependent natural language processing task. Various methods have been successfully applied to a wide range of languages, and the state-of-the-art automatic language identifiers are mainly based on character n-gram models trained on huge corpora. However, there are many languages which are not yet automatically processed, for instance minority and informal languages. Many of these languages are only spoken and do not exist in a written format. Social media platforms and new technologies have facilitated the emergence of written format for these spoken languages based on pronunciation. The latter are not well represented on the Web, commonly referred to as under-resourced languages, and the current available ALI tools fail to properly recognize them. In this paper, we revisit the problem of ALI with the focus on Arabicized Berber and dialectal Arabic short texts. We introduce new resources and evaluate the existing methods. The results show that machine learning models combined with lexicons are well suited for detecting Arabicized Berber and different Arabic varieties and distinguishing between them, giving a macro-average F-score of 92.94%.
2015
pdf
bib
Discriminating between Similar Languages Using PPM
Victoria Bobicev
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects
pdf
bib
Learning Relationship between Authors’ Activity and Sentiments: A case study of online medical forums
Marina Sokolova
|
Victoria Bobicev
Proceedings of the International Conference Recent Advances in Natural Language Processing
2014
pdf
bib
Recognition of Sentiment Sequences in Online Discussions
Victoria Bobicev
|
Marina Sokolova
|
Michael Oakes
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)
2013
pdf
bib
Authorship Attribution in Health Forums
Victoria Bobicev
|
Marina Sokolova
|
Khaled El Emam
|
Stan Matwin
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
pdf
bib
What Sentiments Can Be Found in Medical Forums?
Marina Sokolova
|
Victoria Bobicev
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013
pdf
bib
Native Language Identification with PPM
Victoria Bobicev
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications
2011
pdf
bib
Sentiments and Opinions in Health-related Web messages
Marina Sokolova
|
Victoria Bobicev
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
pdf
bib
Agreement: How to Reach it? Defining Language Features Leading to Agreement in Discourse
Tatiana Zidraşco
|
Victoria Bobicev
|
Shun Shiramatsu
|
Tadachika Ozono
|
Toramatsu Shintani
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011
2009
pdf
bib
Classification of Emotion Words in Russian and Romanian Languages
Marina Sokolova
|
Victoria Bobicev
Proceedings of the International Conference RANLP-2009
2008
pdf
bib
abs
Estimating Word Phonosemantics
Victoria Bobicev
|
Tatiana Zidraşco
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The paper describes a method of word phonosemantics estimation. We treat phonosemantics as a subconscious emotional perception of word sounding independent on the word meaning. The method is based on the data about emotional perception of sounds obtained from a number of respondents. A program estimates words emotional characteristics using the data about sounds. The program output was compared with humans judgment. The results of the experiments showed that in most cases computer description of a word based on phonosemantic calculations is similar with our own impressions of the words sounding. On the other hand the word meaning dominates in emotional perception of the word and phonosemantic part comes out for the words with unknown meaning.