We would host the AutoMin generation chal- lenge at INLG 2023 as a follow-up of the first AutoMin shared task at Interspeech 2021. Our shared task primarily concerns the automated generation of meeting minutes from multi-party meeting transcripts. In our first venture, we ob- served the difficulty of the task and highlighted a number of open problems for the community to discuss, attempt, and solve. Hence, we invite the Natural Language Generation (NLG) com- munity to take part in the second iteration of AutoMin. Like the first, the second AutoMin will feature both English and Czech meetings and the core task of summarizing the manually- revised transcripts into bulleted minutes. A new challenge we are introducing this year is to devise efficient metrics for evaluating the quality of minutes. We will also host an optional track to generate minutes for European parliamentary sessions. We carefully curated the datasets for the above tasks. Our ELITR Minuting Corpus has been recently accepted to LREC 2022 and publicly released. We are already preparing a new test set for evaluating the new shared tasks. We hope to carry forward the learning from the first AutoMin and instigate more community attention and interest in this timely yet chal- lenging problem. INLG, the premier forum for the NLG community, would be an appropriate venue to discuss the challenges and future of Automatic Minuting. The main objective of the AutoMin GenChal at INLG 2023 would be to come up with efficient methods to auto- matically generate meeting minutes and design evaluation metrics to measure the quality of the minutes.
This work represents the system proposed by team Innovators for SemEval 2022 Task 8: Multilingual News Article Similarity. Similar multilingual news articles should match irrespective of the style of writing, the language of conveyance, and subjective decisions and biases induced by medium/outlet. The proposed architecture includes a machine translation system that translates multilingual news articles into English and presents a multitask learning model trained simultaneously on three distinct datasets. The system leverages the PageRank algorithm for Long-form text alignment. Multitask learning approach allows simultaneous training of multiple tasks while sharing the same encoder during training, facilitating knowledge transfer between tasks. Our best model is ranked 16 with a Pearson score of 0.733.
With the Surge in COVID-19, the number of social media postings related to the vaccine has grown, specifically tracing the confirmed reports by the users regarding the COVID-19 vaccine dose termed “Vaccine Surveillance.” To mitigate this research problem, we present our novel ensembled approach for self-reporting COVID-19 vaccination status tweets into two labels, namely “Vaccine Chatter” and “Self Report.” We utilize state-of-the-art models, namely BERT, RoBERTa, and XLNet. Our model provides promising results with 0.77, 0.93, and 0.66 as precision, recall, and F1-score (respectively), comparable to the corresponding median scores of 0.77, 0.9, and 0.68 (respec- tively). The model gave an overall accuracy of 93.43. We also present an empirical analysis of the results to present how well the tweet was able to classify and report. We release our code base here https://github.com/Zohair0209/SMM4H-2022-Task6.git
This paper presents our submission for the Shared Task-2 of classification of stance and premise in tweets about health mandates related to COVID-19 at the Social Media Mining for Health 2022. There have been a plethora of tweets about people expressing their opinions on the COVID-19 epidemic since it first emerged. The shared task emphasizes finding the level of cooperation within the mandates for their stance towards the health orders of the pandemic. Overall the shared subjects the participants to propose system’s that can efficiently perform 1) Stance Detection, which focuses on determining the author’s point of view in the text. 2) Premise Classification, which indicates whether or not the text has arguments. Through this paper we propose an orchestration of multiple transformer based encoders to derive the output for stance and premise classification. Our best model achieves a F1 score of 0.771 for Premise Classification and an aggregate macro-F1 score of 0.661 for Stance Detection. We have made our code public here
In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a few annotated examples (i.e., a few-shot configuration).We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM tasks to directly generate textual responses to CRI-specific prompts.We compare the performance of this method against ensemble techniques trained on the entire dataset.Our best-performing submission was fine-tuned with only 256 instances per class, 15.7% of the all available data, and yet obtained the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the winner team (0.86).
In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 — a pre-trained autoregressive language model. We iteratively identify all cause-effect-signal span triplets, always conditioning the prediction of the next triplet on the previously predicted ones. To predict the triplet itself, we consider different causal relationships such as cause→effect→signal. Each triplet component is generated via a language model conditioned on the sentence, the previous parts of the current triplet, and previously predicted triplets. Despite training on an extremely small dataset of 160 samples, our approach achieved competitive performance, being placed second in the competition. Furthermore, we show that assuming either cause→effect or effect→cause order achieves similar results.
This paper documents our approach for the Creative-Summ 2022 shared task for Automatic Summarization of Creative Writing. For this purpose, we develop an automatic summarization pipeline where we leverage a denoising autoencoder for pretraining sequence-to-sequence models and fine-tune it on a large-scale abstractive screenplay summarization dataset to summarize TV transcripts from primetime shows. Our pipeline divides the input transcript into smaller conversational blocks, removes redundant text, summarises the conversational blocks, obtains the block-wise summaries, cleans, structures, and then integrates the summaries to create the meeting minutes. Our proposed system achieves some of the best scores across multiple metrics(lexical, semantical) in the Creative-Summ shared task.
With the increase of users on social media platforms, manipulating or provoking masses of people has become a piece of cake. This spread of hatred among people, which has become a loophole for freedom of speech, must be minimized. Hence, it is essential to have a system that automatically classifies the hatred content, especially on social media, to take it down. This paper presents a simple modular pipeline classifier with BERT embeddings and attention mechanism to classify hope speech content in the Hope Speech Detection shared task for Equality, Diversity, and Inclusion-ACL 2022. Our system submission ranks fourth with an F1-score of 0.84. We release our code-base here https://github.com/Deepanshu-beep/hope-speech-attention .
Social media platforms have been provoking masses of people. The individual comments affect a prevalent way of thinking by moving away from preoccupation with discrimination, loneliness, or influence in building confidence, support, and good qualities. This paper aims to identify hope in these social media posts. Hope significantly impacts the well-being of people, as suggested by health professionals. It reflects the belief to achieve an objective, discovers a new path, or become motivated to formulate pathways.In this paper we classify given a social media post, hope speech or not hope speech, using ensembled voting of BERT, ERNIE 2.0 and RoBERTa for English language with 0.54 macro F1-score (2st rank). For non-English languages Malayalam, Spanish and Tamil we utilized XLM RoBERTA with 0.50, 0.81, 0.3 macro F1 score (1st, 1st,3rd rank) respectively. For Kannada, we use Multilingual BERT with 0.32 F1 score(5th)position. We release our code-base here: https://github.com/Muskaan-Singh/Hate-Speech-detection.git.
The increased expansion of abusive content on social media platforms negatively affects online users. Transphobic/homophobic content indicates hatred comments for lesbian, gay, transgender, or bisexual people. It leads to offensive speech and causes severe social problems that can make online platforms toxic and unpleasant to LGBT+people, endeavoring to eliminate equality, diversity, and inclusion. In this paper, we present our classification system; given comments, it predicts whether or not it contains any form of homophobia/transphobia with a Zero-Shot learning framework. Our system submission achieved 0.40, 0.85, 0.89 F1-score for Tamil and Tamil-English, English with (1st, 1st,8th) ranks respectively. We release our codebase here: https://github.com/Muskaan-Singh/Homophobia-and-Transphobia-ACL-Submission.git.
Depression is a common illness involving sadness and lack of interest in all day-to-day activities. It is important to detect depression at an early stage as it is treated at an early stage to avoid consequences. In this paper, we present our system submission of ARGUABLY for DepSign-LT-EDI@ACL-2022. We aim to detect the signs of depression of a person from their social media postings wherein people share their feelings and emotions. The proposed system is an ensembled voting model with fine-tuned BERT, RoBERTa, and XLNet. Given social media postings in English, the submitted system classify the signs of depression into three labels, namely “not depressed,” “moderately depressed,” and “severely depressed.” Our best model is ranked 3rd position with 0.54% accuracy . We make our codebase accessible here.
Summarization is a challenging problem, and even more challenging is to manually create, correct, and evaluate the summaries. The severity of the problem grows when the inputs are multi-party dialogues in a meeting setup. To facilitate the research in this area, we present ALIGNMEET, a comprehensive tool for meeting annotation, alignment, and evaluation. The tool aims to provide an efficient and clear interface for fast annotation while mitigating the risk of introducing errors. Moreover, we add an evaluation mode that enables a comprehensive quality evaluation of meeting minutes. To the best of our knowledge, there is no such tool available. We release the tool as open source. It is also directly installable from PyPI.
Taking minutes is an essential component of every meeting, although the goals, style, and procedure of this activity (“minuting” for short) can vary. Minuting is a rather unstructured writing activity and is affected by who is taking the minutes and for whom the intended minutes are. With the rise of online meetings, automatic minuting would be an important benefit for the meeting participants as well as for those who might have missed the meeting. However, automatically generating meeting minutes is a challenging problem due to a variety of factors including the quality of automatic speech recorders (ASRs), availability of public meeting data, subjective knowledge of the minuter, etc. In this work, we present the first of its kind dataset on Automatic Minuting. We develop a dataset of English and Czech technical project meetings which consists of transcripts generated from ASRs, manually corrected, and minuted by several annotators. Our dataset, AutoMin, consists of 113 (English) and 53 (Czech) meetings, covering more than 160 hours of meeting content. Upon acceptance, we will publicly release (aaa.bbb.ccc) the dataset as a set of meeting transcripts and minutes, excluding the recordings for privacy reasons. A unique feature of our dataset is that most meetings are equipped with more than one minute, each created independently. Our corpus thus allows studying differences in what people find important while taking the minutes. We also provide baseline experiments for the community to explore this novel problem further. To the best of our knowledge AutoMin is probably the first resource on minuting in English and also in a language other than English (Czech).
With the development of multimodal systems and natural language generation techniques, the resurgence of multimodal datasets has attracted significant research interests, which aims to provide new information to enrich the representation of textual data. However, there remains a lack of a comprehensive survey for this task. To this end, we take the first step and present a thorough review of this research field. This paper provides an overview of a publicly available dataset with different modalities according to the applications. Furthermore, we discuss the new frontier and give our thoughts. We hope this survey of multimodal datasets can provide the community with quick access and a general picture of the multimodal dataset for specific Natural Language Processing (NLP) applications and motivates future researches. In this context, we release the collection of all multimodal datasets easily accessible here: https://github.com/drmuskangarg/Multimodal-datasets
This paper presents our submission for the shared task on isometric neural machine translation in International Conference on Spoken Language Translation (IWSLT). There are numerous state-of-art models for translation problems. However, these models lack any length constraint to produce short or long outputs from the source text. In this paper, we propose a hierarchical approach to generate isometric translation on MUST-C dataset, we achieve a BERTscore of 0.85, a length ratio of 1.087, a BLEU score of 42.3, and a length range of 51.03%. On the blind dataset provided by the task organizers, we obtain a BERTscore of 0.80, a length ratio of 1.10 and a length range of 47.5%. We have made our code public here https://github.com/aakash0017/Machine-Translation-ISWLT
In this paper, we present our minuting tool DeepCon, an end-to-end toolkit for minuting the multiparty dialogues of meetings. It provides technological support for (multilingual) communication and collaboration, with a specific focus on Natural Language Processing (NLP) technologies: Automatic Speech Recognition (ASR), Machine Translation (MT), Automatic Minuting (AM), Topic Modelling (TM) and Named Entity Recognition (NER). To the best of our knowledge, there is no such tool available. Further, this tool follows a microservice architecture, and we release the tool as open-source, deployed on Amazon Web Services (AWS). We release our tool open-source here http://www.deepcon.in.
Citations are crucial to a scientific discourse. Besides providing additional contexts to research papers, citations act as trackers of the direction of research in a field and as an important measure in understanding the impact of a research publication. With the rapid growth in research publications, automated solutions for identifying the purpose and influence of citations are becoming very important. The 3C Citation Context Classification Task organized as part of the Second Workshop on Scholarly Document Processing @ NAACL 2021 is a shared task to address the aforementioned problems. In this paper, we present our team, IITP-CUNI@3C’s submission to the 3C shared tasks. For Task A, citation context purpose classification, we propose a neural multi-task learning framework that harnesses the structural information of the research papers and the relation between the citation context and the cited paper for citation classification. For Task B, citation context influence classification, we use a set of simple features to classify citations based on their perceived significance. We achieve comparable performance with respect to the best performing systems in Task A and superseded the majority baseline in Task B with very simple features.
This paper describes our participating system in the shared task Explainable quality estimation of 2nd Workshop on Evaluation & Comparison of NLP Systems. The task of quality estimation (QE, a.k.a. reference-free evaluation) is to predict the quality of MT output at inference time without access to reference translations. In this proposed work, we first build a word-level quality estimation model, then we finetune this model for sentence-level QE. Our proposed models achieve near state-of-the-art results. In the word-level QE, we place 2nd and 3rd on the supervised Ro-En and Et-En test sets. In the sentence-level QE, we achieve a relative improvement of 8.86% (Ro-En) and 10.6% (Et-En) in terms of the Pearson correlation coefficient over the baseline model.