This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
MirnaAdriani
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
The growing demand to develop an automatic emotion recognition system for the Human-Computer Interaction field had pushed some research in speech emotion detection. Although it is growing, there is still little research about automatic speech emotion detection in Bahasa Indonesia. Another issue is the lack of standard corpus for this research area in Bahasa Indonesia. This study proposed several approaches to detect speech-emotion in the dialogs of an Indonesian movie by classifying them into 4 different emotion classes i.e. happiness, sadness, anger, and neutral. There are two different speech data representations used in this study i.e. statistical and temporal/sequence representations. This study used Artificial Neural Network (ANN), Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM) variation, word embedding, and also the hybrid of three to perform the classification task. The best accuracies given by one-vs-rest scenario for each emotion class with speech-transcript pairs using hybrid of non-temporal and embedding approach are 1) happiness: 76.31%; 2) sadness: 86.46%; 3) anger: 82.14%; and 4) neutral: 68.51%. The multiclass classification resulted in 64.66% of precision, 66.79% of recall, and 64.83% of F1-score.
Twitter is an excellent source of data for NLP researches as it offers tremendous amount of textual data. However, processing tweet to extract meaningful information is very challenging, at least for two reasons: (i) using nonstandard words as well as informal writing manner, and (ii) code-mixing issues, which is combining multiple languages in single tweet conversation. Most of the previous works have addressed both issues in isolated different task. In this study, we work on normalization task in code-mixed Twitter data, more specifically in Indonesian-English language. We propose a pipeline that consists of four modules, i.e tokenization, language identification, lexical normalization, and translation. Another contribution is to provide a gold standard of Indonesian-English code-mixed data for each module.
Ambiguity is a problem we frequently face in Natural Language Processing. Word Sense Disambiguation (WSD) is a task to determine the correct sense of an ambiguous word. However, research in WSD for Indonesian is still rare to find. The availability of English-Indonesian parallel corpora and WordNet for both languages can be used as training data for WSD by applying Cross-Lingual WSD method. This training data is used as an input to build a model using supervised machine learning algorithms. Our research also examines the use of Word Embedding features to build the WSD model.