Shamik Roy


2024

pdf
A First Step towards Measuring Interdisciplinary Engagement in Scientific Publications: A Case Study on NLP + CSS Research
Alexandria Leto | Shamik Roy | Alexander Hoyle | Daniel Acuna | Maria Leonor Pacheco
Proceedings of the Sixth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024)

With the rise in the prevalence of cross-disciplinary research, there is a need to develop methods to characterize its practices. Current computational methods to evaluate interdisciplinary engagement—such as affiliation diversity, keywords, and citation patterns—are insufficient to model the degree of engagement between disciplines, as well as the way in which the complementary expertise of co-authors is harnessed. In this paper, we propose an automated framework to address some of these issues on a large scale. Our framework tracks interdisciplinary citations in scientific articles and models: 1) the section and position in which they appear, and 2) the argumentative role that they play in the writing. To showcase our framework, we perform a preliminary analysis of interdisciplinary engagement in published work at the intersection of natural language processing and computational social science in the last decade.

pdf
FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs
Shamik Roy | Sailik Sengupta | Daniele Bonadiman | Saab Mansour | Arshit Gupta
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Planning is a crucial task for agents in task oriented dialogs (TODs). Human agents typically resolve user issues by following predefined workflows, decomposing workflow steps into actionable items, and performing actions by executing APIs in order; all of which require reasoning and planning. With the recent advances in LLMs, there have been increasing attempts to use them for task planning and API usage. However, the faithfulness of the plans to predefined workflows and API dependencies, is not guaranteed with LLMs. Moreover, workflows in real life are often custom-defined and prone to changes; hence, adaptation is desirable. To study this, we propose the problem of faithful planning in TODs that needs to resolve user intents by following predefined flows and preserving API dependencies. To solve this problem, we propose FLAP, a Flow-Adhering Planning algorithm based on constrained decoding with lookahead heuristic for LLMs. Our algorithm alleviates the need for finetuning LLMs using domain specific (plan/dependency) data, enables quick adaptation to predefined flows, and outperforms other decoding and prompting-based baselines. Further, our algorithm empowers smaller LLMs (≈7B) to perform at par larger LLMs (≈30B-40B).

2023

pdf
“A Tale of Two Movements’: Identifying and Comparing Perspectives in #BlackLivesMatter and #BlueLivesMatter Movements-related Tweets using Weakly Supervised Graph-based Structured Prediction
Shamik Roy | Dan Goldwasser
Findings of the Association for Computational Linguistics: EMNLP 2023

Social media has become a major driver of social change, by facilitating the formation of online social movements. Automatically understanding the perspectives driving the movement and the voices opposing it, is a challenging task as annotated data is difficult to obtain. We propose a weakly supervised graph-based approach that explicitly models perspectives in #BackLivesMatter-related tweets. Our proposed approach utilizes a social-linguistic representation of the data. We convert the text to a graph by breaking it into structured elements and connect it with the social network of authors, then structured prediction is done over the elements for identifying perspectives. Our approach uses a small seed set of labeled examples. We experiment with large language models for generating artificial training examples, compare them to manual annotation, and find that it achieves comparable performance. We perform quantitative and qualitative analyses using a human-annotated test set. Our model outperforms multitask baselines by a large margin, successfully characterizing the perspectives supporting and opposing #BLM.

pdf
Conversation Style Transfer using Few-Shot Learning
Shamik Roy | Raphael Shu | Nikolaos Pappas | Elman Mansimov | Yi Zhang | Saab Mansour | Dan Roth
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

2022

pdf
Hands-On Interactive Neuro-Symbolic NLP with DRaiL
Maria Leonor Pacheco | Shamik Roy | Dan Goldwasser
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We recently introduced DRaiL, a declarative neural-symbolic modeling framework designed to support a wide variety of NLP scenarios. In this paper, we enhance DRaiL with an easy to use Python interface, equipped with methods to define, modify and augment DRaiL models interactively, as well as with methods to debug and visualize the predictions made. We demonstrate this interface with a challenging NLP task: predicting sentence and entity level moral sentiment in political tweets.

pdf
Towards Few-Shot Identification of Morality Frames using In-Context Learning
Shamik Roy | Nishanth Sridhar Nakshatri | Dan Goldwasser
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Data scarcity is a common problem in NLP, especially when the annotation pertains to nuanced socio-linguistic concepts that require specialized knowledge. As a result, few-shot identification of these concepts is desirable. Few-shot in-context learning using pre-trained Large Language Models (LLMs) has been recently applied successfully in many NLP tasks. In this paper, we study few-shot identification of a psycho-linguistic concept, Morality Frames (Roy et al., 2021), using LLMs. Morality frames are a representation framework that provides a holistic view of the moral sentiment expressed in text, identifying the relevant moral foundation (Haidt and Graham, 2007) and at a finer level of granularity, the moral sentiment expressed towards the entities mentioned in the text. Previous studies relied on human annotation to identify morality frames in text which is expensive. In this paper, we propose prompting based approaches using pretrained Large Language Models for identification of morality frames, relying only on few-shot exemplars. We compare our models’ performance with few-shot RoBERTa and found promising results.

2021

pdf
Identifying Morality Frames in Political Tweets using Relational Learning
Shamik Roy | Maria Leonor Pacheco | Dan Goldwasser
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Extracting moral sentiment from text is a vital component in understanding public opinion, social movements, and policy decisions. The Moral Foundation Theory identifies five moral foundations, each associated with a positive and negative polarity. However, moral sentiment is often motivated by its targets, which can correspond to individuals or collective entities. In this paper, we introduce morality frames, a representation framework for organizing moral attitudes directed at different entities, and come up with a novel and high-quality annotated dataset of tweets written by US politicians. Then, we propose a relational learning model to predict moral attitudes towards entities and moral foundations jointly. We do qualitative and quantitative evaluations, showing that moral sentiment towards entities differs highly across political ideologies.

pdf bib
Analysis of Nuanced Stances and Sentiment Towards Entities of US Politicians through the Lens of Moral Foundation Theory
Shamik Roy | Dan Goldwasser
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

The Moral Foundation Theory suggests five moral foundations that can capture the view of a user on a particular issue. It is widely used to identify sentence-level sentiment. In this paper, we study the Moral Foundation Theory in tweets by US politicians on two politically divisive issues - Gun Control and Immigration. We define the nuanced stance of politicians on these two topics by the grades given by related organizations to the politicians. First, we identify moral foundations in tweets from a huge corpus using deep relational learning. Then, qualitative and quantitative evaluations using the corpus show that there is a strong correlation between the moral foundation usage and the politicians’ nuanced stance on a particular topic. We also found substantial differences in moral foundation usage by different political parties when they address different entities. All of these results indicate the need for more intense research in this area.

2020

pdf
Weakly Supervised Learning of Nuanced Frames for Analyzing Polarization in News Media
Shamik Roy | Dan Goldwasser
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we suggest a minimally supervised approach for identifying nuanced frames in news article coverage of politically divisive topics. We suggest to break the broad policy frames suggested by Boydstun et al., 2014 into fine-grained subframes which can capture differences in political ideology in a better way. We evaluate the suggested subframes and their embedding, learned using minimal supervision, over three topics, namely, immigration, gun-control, and abortion. We demonstrate the ability of the subframes to capture ideological differences and analyze political discourse in news media.