Tushar Abhishek

2023

pdf abs
Billy-Batson at SemEval-2023 Task 5: An Information Condensation based System for Clickbait Spoiling
Anubhav Sharma | Sagar Joshi | Tushar Abhishek | Radhika Mamidi | Vasudeva Varma
Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023)

The Clickbait Challenge targets spoiling the clickbaits using short pieces of information known as spoilers to satisfy the curiosity induced by a clickbait post.The large context of the article associated with the clickbait and differences in the spoiler forms, make the task challenging.Hence, to tackle the large context, we propose an Information Condensation-based approach, which prunes down the unnecessary context.Given an article, our filtering module optimised with a contrastive learning objective first selects the parapraphs that are the most relevant to the corresponding clickbait.The resulting condensed article is then fed to the two downstream tasks of spoiler type classification and spoiler generation.We demonstrate and analyze the gains from this approach on both the tasks.Overall, we win the task of spoiler type classification and achieve competitive results on spoiler generation.

2021

pdf abs
Cross-lingual Alignment of Knowledge Graph Triples with Sentences
Swayatta Daw | Shivprasad Sagare | Tushar Abhishek | Vikram Pudi | Vasudeva Varma
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

The pairing of natural language sentences with knowledge graph triples is essential for many downstream tasks like data-to-text generation, facts extraction from sentences (semantic parsing), knowledge graph completion, etc. Most existing methods solve these downstream tasks using neural-based end-to-end approaches that require a large amount of well-aligned training data, which is difficult and expensive to acquire. Recently various unsupervised techniques have been proposed to alleviate this alignment step by automatically pairing the structured data (knowledge graph triples) with textual data. However, these approaches are not well suited for low resource languages that provide two major challenges: (1) unavailability of pair of triples and native text with the same content distribution and (2) limited Natural language Processing (NLP) resources. In this paper, we address the unsupervised pairing of knowledge graph triples with sentences for low resource languages, selecting Hindi as the low resource language. We propose cross-lingual pairing of English triples with Hindi sentences to mitigate the unavailability of content overlap. We propose two novel approaches: NER-based filtering with Semantic Similarity and Key-phrase Extraction with Relevance Ranking. We use our best method to create a collection of 29224 well-aligned English triples and Hindi sentence pairs. Additionally, we have also curated 350 human-annotated golden test datasets for evaluation. We make the code and dataset publicly available.

Co-authors

Sagar Joshi 1

Radhika Mamidi 1

Venues

icon1
semeval1