This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
CeciliaOvesdotter Alm
Also published as:
Cecilia O. Alm,
Cecilia O. Alm,
Cecilia Ovesdotter Alm
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
This study introduces a novel multimodal corpus for expressive task-based spoken language and dialogue, focused on language use under frustration and surprise, elicited from three tasks motivated by prior research and collected in an IRB-approved experiment. The resource is unique both because these are understudied affect states for emotion modeling in language, and also because it provides both individual and dyadic multimodally grounded language. The study includes a detailed analysis of annotations and performance results for multimodal emotion inference in language use.
This paper addresses an existing resource gap for studying complex emotional states when a speaker collaborates with a partner to solve a task. We present a novel dialogue resource — the MULTICOLLAB corpus — where two interlocutors, an instructor and builder, communicated through a Zoom call while sensors recorded eye gaze, facial action units, and galvanic skin response, with transcribed speech signals, resulting in a unique, heavily multimodal corpus. The builder received instructions from the instructor. Half of the builders were privately told to disobey the instructor’s directions. After the task, participants watched the Zoom recording and annotated their instances of frustration. In this study, we introduce this new corpus and perform computational experiments with time series transformers, using early fusion through time for sensor data and late fusion for speech transcripts. We then average predictions from both methods to recognize instructor frustration. Using sensor and speech data in a 4.5 second time window, we find that the fusion of both models yields 21% improvement in classification accuracy (with a precision of 79% and F1 of 63%) over a comparison baseline, demonstrating that complex emotions can be recognized when rich multimodal data from transcribed spoken dialogue and biophysical sensor data are fused.
Event identification in technical logbooks poses challenges given the limited logbook data available in specific technical domains, the large set of possible classes, and logbook entries typically being in short form and non-standard technical language. Technical logbook data typically has both a domain, the field it comes from (e.g., automotive), and an application, what it is used for (e.g., maintenance). In order to better handle the problem of data scarcity, using a variety of technical logbook datasets, this paper investigates the benefits of using transfer learning from sources within the same domain (but different applications), from within the same application (but different domains) and from all available data. Results show that performing transfer learning within a domain provides statistically significant improvements, and in all cases but one the best performance. Interestingly, transfer learning from within the application or across the global dataset degrades results in all cases but one, which benefited from adding as much data as possible. A further analysis of the dataset similarities shows that the datasets with higher similarity scores performed better in transfer learning tasks, suggesting that this can be utilized to determine the effectiveness of adding a dataset in a transfer learning task for technical logbooks.
Technical logbooks are a challenging and under-explored text type in automated event identification. These texts are typically short and written in non-standard yet technical language, posing challenges to off-the-shelf NLP pipelines. The granularity of issue types described in these datasets additionally leads to class imbalance, making it challenging for models to accurately predict which issue each logbook entry describes. In this paper we focus on the problem of technical issue classification by considering logbook datasets from the automotive, aviation, and facilities maintenance domains. We adapt a feedback strategy from computer vision for handling extreme class imbalance, which resamples the training data based on its error in the prediction process. Our experiments show that with statistical significance this feedback strategy provides the best results for four different neural network models trained across a suite of seven different technical logbook datasets from distinct technical domains. The feedback strategy is also generic and could be applied to any learning problem with substantial class imbalances.
Much of the world’s population experiences some form of disability during their lifetime. Caution must be exercised while designing natural language processing (NLP) systems to prevent systems from inadvertently perpetuating ableist bias against people with disabilities, i.e., prejudice that favors those with typical abilities. We report on various analyses based on word predictions of a large-scale BERT language model. Statistically significant results demonstrate that people with disabilities can be disadvantaged. Findings also explore overlapping forms of discrimination related to interconnected gender and race identities.
While labor issues and quality assurance in crowdwork are increasingly studied, how annotators make sense of texts and how they are personally impacted by doing so are not. We study these questions via a narrative-sorting annotation task, where carefully selected (by sequentiality, topic, emotional content, and length) collections of tweets serve as examples of everyday storytelling. As readers process these narratives, we measure their facial expressions, galvanic skin response, and self-reported reactions. From the perspective of annotator well-being, a reassuring outcome was that the sorting task did not cause a measurable stress response, however readers reacted to humor. In terms of sensemaking, readers were more confident when sorting sequential, target-topical, and highly emotional tweets. As crowdsourcing becomes more common, this research sheds light onto the perceptive capabilities and emotional impact of human readers.
Software developers and testers have long struggled with how to elicit proactive responses from their coworkers when reviewing code for security vulnerabilities and errors. For a code review to be successful, it must not only identify potential problems but also elicit an active response from the colleague responsible for modifying the code. To understand the factors that contribute to this outcome, we analyze a novel dataset of more than one million code reviews for the Google Chromium project, from which we extract linguistic features of feedback that elicited responsive actions from coworkers. Using a manually-labeled subset of reviewer comments, we trained a highly accurate classifier to identify acted-upon comments (AUC = 0.85). Our results demonstrate the utility of our dataset, the feasibility of using NLP for this new task, and the potential of NLP to improve our understanding of how communications between colleagues can be authored to elicit positive, proactive responses.
Humans rely on multiple sensory modalities when examining and reasoning over images. In this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task. The task was performed by multiple participants on 100 general-domain images showing everyday objects and activities. We demonstrate the usefulness of the dataset by applying an existing visual-linguistic data fusion framework in order to label important image regions with appropriate linguistic labels.
We present an educational tool that integrates computational linguistics resources for use in non-technical undergraduate language science courses. By using the tool in conjunction with evidence-driven pedagogical case studies, we strive to provide opportunities for students to gain an understanding of linguistic concepts and analysis through the lens of realistic problems in feasible ways. Case studies tend to be used in legal, business, and health education contexts, but less in the teaching and learning of linguistics. The approach introduced also has potential to encourage students across training backgrounds to continue on to computational language analysis coursework.
Interpersonal violence (IPV) is a prominent sociological problem that affects people of all demographic backgrounds. By analyzing how readers interpret, perceive, and react to experiences narrated in social media posts, we explore an understudied source for discourse about abuse. We asked readers to annotate Reddit posts about relationships with vs. without IPV for stakeholder roles and emotion, while measuring their galvanic skin response (GSR), pulse, and facial expression. We map annotations to coreference resolution output to obtain a labeled coreference chain for stakeholders in texts, and apply automated semantic role labeling for analyzing IPV discourse. Findings provide insights into how readers process roles and emotion in narratives. For example, abusers tend to be linked with violent actions and certain affect states. We train classifiers to predict stakeholder categories of coreference chains. We also find that subjects’ GSR noticeably changed for IPV texts, suggesting that co-collected measurement-based data about annotators can be used to support text annotation.
University students in the United States are routinely asked to provide feedback on the quality of the instruction they have received. Such feedback is widely used by university administrators to evaluate teaching ability, despite growing evidence that students assign lower numerical scores to women and people of color, regardless of the actual quality of instruction. In this paper, we analyze students’ written comments on faculty evaluation forms spanning eight years and five STEM disciplines in order to determine whether open-ended comments reflect these same biases. First, we apply sentiment analysis techniques to the corpus of comments to determine the overall affect of each comment. We then use this information, in combination with other features, to explore whether there is bias in how students describe their instructors. We show that while the gender of the evaluated instructor does not seem to affect students’ expressed level of overall satisfaction with their instruction, it does strongly influence the language that they use to describe their instructors and their experience in class.
Computational linguistics has witnessed a surge of interest in approaches to emotion and affect analysis, tackling problems that extend beyond sentiment analysis in depth and complexity. This area involves basic emotions (such as joy, sadness, and fear) as well as any of the hundreds of other emotions humans are capable of (such as optimism, frustration, and guilt), expanding into affective conditions, experiences, and activities. Leveraging linguistic data for computational affect and emotion inference enables opportunities to address a range of affect-related tasks, problems, and non-invasive applications that capture aspects essential to the human condition and individuals’ cognitive processes. These efforts enable and facilitate human-centered computing experiences, as demonstrated by applications across clinical, socio-political, artistic, educational, and commercial domains. Efforts to computationally detect, characterize, and generate emotions or affect-related phenomena respond equally to technological needs for personalized, micro-level analytics and broad-coverage, macro-level inference, and they have involved both small and massive amounts of data.While this is an exciting area with numerous opportunities for members of the ACL community, a major obstacle is its intersection with other investigatory traditions, necessitating knowledge transfer. This tutorial comprehensively integrates relevant concepts and frameworks from linguistics, cognitive science, affective computing, and computational linguistics in order to equip researchers and practitioners with the adequate background and knowledge to work effectively on problems and tasks either directly involving, or benefiting from having an understanding of, affect and emotion analysis.There is a substantial body of work in traditional sentiment analysis focusing on positive and negative sentiment. This tutorial covers approaches and features that migrate well to affect analysis. We also discuss key differences from sentiment analysis, and their implications for analyzing affect and emotion.The tutorial begins with an introduction that highlights opportunities, key terminology, and interesting tasks and challenges (1). The body of the tutorial covers characteristics of emotive language use with emphasis on relevance for computational analysis (2); linguistic data—from conceptual analysis frameworks via useful existing resources to important annotation topics (3); computational approaches for lexical semantic emotion analysis (4); computational approaches for emotion and affect analysis in text (5); visualization methods (6); and a survey of application areas with affect-related problems (7). The tutorial concludes with an outline of future directions and a discussion with participants about the areas relevant to their respective tasks of interest (8).Besides attending the tutorial, tutorial participants receive electronic copies of tutorial slides, a complete reference list, as well as a categorized annotated bibliography that concentrates on seminal works, recent important publications, and other products and resources for researchers and developers.