This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we generate only three BibTeX files per volume, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Impact assessment is an evolving area of research that aims at measuring and predicting the potential effects of projects or programs. Measuring the impact of scientific research is a vibrant subdomain, closely intertwined with impact assessment. A recurring obstacle pertains to the absence of an efficient framework which can facilitate the analysis of lengthy reports and text labeling. To address this issue, we propose a framework for automatically assessing the impact of scientific research projects by identifying pertinent sections in project reports that indicate the potential impacts. We leverage a mixed-method approach, combining manual annotations with supervised machine learning, to extract these passages from project reports. We experiment with different machine learning algorithms, including traditional statistical models as well as pre-trained transformer language models. Our experiments show that our proposed method achieves accuracy scores up to 0.81, and that our method is generalizable to scientific research from different domains and different languages.
Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of podcast summarization and show that we can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.
This paper proposes, implements and evaluates a novel, corpus-based approach for identifying categories indicative of the impact of research via a deductive (top-down, from theory to data) and an inductive (bottom-up, from data to theory) approach. The resulting categorization schemes differ in substance. Research outcomes are typically assessed by using bibliometric methods, such as citation counts and patterns, or alternative metrics, such as references to research in the media. Shortcomings with these methods are their inability to identify impact of research beyond academia (bibliometrics) and considering text-based impact indicators beyond those that capture attention (altmetrics). We address these limitations by leveraging a mixed-methods approach for eliciting impact categories from experts, project personnel (deductive) and texts (inductive). Using these categories, we label a corpus of project reports per category schema, and apply supervised machine learning to infer these categories from project reports. The classification results show that we can predict deductively and inductively derived impact categories with 76.39% and 78.81% accuracy (F1-score), respectively. Our approach can complement solutions from bibliometrics and scientometrics for assessing the impact of research and studying the scope and types of advancements transferred from academia to society.
Podcasts are a large and growing repository of spoken audio. As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with automatic speech recognition they represent a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts. We demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.
In times of crisis, identifying essential needs is crucial to providing appropriate resources and services to affected entities. Social media platforms such as Twitter contain a vast amount of information about the general public’s needs. However, the sparsity of information and the amount of noisy content present a challenge for practitioners to effectively identify relevant information on these platforms. This study proposes two novel methods for two needs detection tasks: 1) extracting a list of needed resources, such as masks and ventilators, and 2) detecting sentences that specify who-needs-what resources (e.g., we need testing). We evaluate our methods on a set of tweets about the COVID-19 crisis. For extracting a list of needs, we compare our results against two official lists of resources, achieving 0.64 precision. For detecting who-needs-what sentences, we compared our results against a set of 1,000 annotated tweets and achieved a 0.68 F1-score.
We investigate the relationship between basic principles of human morality and the expression of opinions in user-generated text data. We assume that people’s backgrounds, culture, and values are associated with their perceptions and expressions of everyday topics, and that people’s language use reflects these perceptions. While personal values and social effects are abstract and complex concepts, they have practical implications and are relevant for a wide range of NLP applications. To extract human values (in this paper, morality) and measure social effects (morality and stance), we empirically evaluate the usage of a morality lexicon that we expanded via a quality controlled, human in the loop process. As a result, we enhanced the Moral Foundations Dictionary in size (from 324 to 4,636 syntactically disambiguated entries) and scope. We used both lexica for feature-based and deep learning classification (SVM, RF, and LSTM) to test their usefulness for measuring social effects. We find that the enhancement of the original lexicon led to measurable improvements in prediction accuracy for the selected NLP tasks.
In this paper, we evaluate the predictability of tweets associated with controversial versus non-controversial topics. As a first step, we crowd-sourced the scoring of a predefined set of topics on a Likert scale from non-controversial to controversial. Our feature set entails and goes beyond sentiment features, e.g., by leveraging empathic language and other features that have been previously used but are new for this particular study. We find focusing on the structural characteristics of tweets to be beneficial for this task. Using a combination of emphatic, language-specific, and Twitter-specific features for supervised learning resulted in 87% accuracy (F1) for cross-validation of the training set and 63.4% accuracy when using the test set. Our analysis shows that features specific to Twitter or social media, in general, are more prevalent in tweets on controversial topics than in non-controversial ones. To test the premise of the paper, we conducted two additional sets of experiments, which led to mixed results. This finding will inform our future investigations into the relationship between language use on social media and the perceived controversiality of topics.