This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
AndreasKerren
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
The occurrence of stance-taking towards vaccination was measured in documents extracted by topic modelling from two different corpora, one discussion forum corpus and one tweet corpus. For some of the topics extracted, their most closely associated documents contained a proportion of vaccine stance-taking texts that exceeded the corpus average by a large margin. These extracted document sets would, therefore, form a useful resource in a process for computer-assisted analysis of argumentation on the subject of vaccination.
In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and data-based features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection process. The classification accuracy achieved, when the 31 highest ranked features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.
A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance ‘against’ or ‘for’ vaccination, or as ‘undecided’. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance ‘against’ vaccination from stance ‘for’ vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Future work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features.
Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Active learning was compared to random selection of training data, as well as to a lexicon-based method. Active learning was successful for the categories speculation, contrast and condition, but not for the two sentiment categories, for which results achieved when using active learning were similar to those achieved when applying a random selection of training data. This difference is likely due to a larger variation in how sentiment is expressed than in how speakers express the other three categories. This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition.