Automatic detection of the Myers-Briggs Type Indicator (MBTI) from short posts attracted noticeable attention in the last few years. Recent studies showed that this is quite a difficult task, especially on commonly used Twitter data. Obtaining MBTI labels is also difficult, as human annotation requires trained psychologists, and automatic way of obtaining them is through long questionnaires of questionable usability for the task. In this paper, we present a method for collecting reliable MBTI labels via only four carefully selected questions that can be applied to any type of textual data.
Automatic detection of the four MBTI personality dimensions from texts has recently attracted noticeable attention from the natural language processing and computational linguistic communities. Despite the large collections of Twitter data for training, the best systems rarely even outperform the majority-class baseline. In this paper, we discuss the theoretical reasons for such low results and present the insights from an annotation study that further shed the light on this issue.
According to the self-determination theory, the levels of satisfaction of three basic needs (competence, autonomy and relatedness) have implications on people’s everyday life and career. We benchmark the novel task of automatically detecting those needs on short posts in English, by modelling it as a ternary classification task, and as three binary classification tasks. A detailed manual analysis shows that the latter has advantages in the real-world scenario, and that our best models achieve similar performances as a trained human annotator.
Personality profiling has long been used in psychology to predict life outcomes. Recently, automatic detection of personality traits from written messages has gained significant attention in computational linguistics and natural language processing communities, due to its applicability in various fields. In this survey, we show the trajectory of research towards automatic personality detection from purely psychology approaches, through psycholinguistics, to the recent purely natural language processing approaches on large datasets automatically extracted from social media. We point out what has been gained and what lost during that trajectory, and show what can be realistic expectations in the field.