Piyush Arora

2022

pdf abs
AMEX AI Labs at SemEval-2022 Task 10: Contextualized fine-tuning of BERT for Structured Sentiment Analysis
Pratyush Sarangi | Shamika Ganesan | Piyush Arora | Salil Joshi
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

We describe the work carried out by AMEX AI Labs on the structured sentiment analysis task at SemEval-2022. This task focuses on extracting fine grained information w.r.t. to source, target and polar expressions in a given text. We propose a BERT based encoder, which utilizes a novel concatenation mechanism for combining syntactic and pretrained embeddings with BERT embeddings. Our system achieved an average rank of 14/32 systems, based on the average scores across seven datasets for five languages provided for the monolingual task. The proposed BERT based approaches outperformed BiLSTM based approaches used for structured sentiment extraction problem. We provide an in-depth analysis based on our post submission analysis.

2020

pdf abs
An Investigative Study of Multi-Modal Cross-Lingual Retrieval
Piyush Arora | Dimitar Shterionov | Yasufumi Moriya | Abhishek Kaushik | Daria Dzendzik | Gareth Jones
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

We describe work from our investigations of the novel area of multi-modal cross-lingual retrieval (MMCLIR) under low-resource conditions. We study the challenges associated with MMCLIR relating to: (i) data conversion between different modalities, for example speech and text, (ii) overcoming the language barrier between source and target languages; (iii) effectively scoring and ranking documents to suit the retrieval task; and (iv) handling low resource constraints that prohibit development of heavily tuned machine translation (MT) and automatic speech recognition (ASR) systems. We focus on the use case of retrieving text and speech documents in Swahili, using English queries which was the main focus of the OpenCLIR shared task. Our work is developed within the scope of this task. In this paper we devote special attention to the automatic translation (AT) component which is crucial for the overall quality of the MMCLIR system. We exploit a combination of dictionaries and phrase-based statistical machine translation (MT) systems to tackle effectively the subtask of query translation. We address each MMCLIR challenge individually, and develop separate components for automatic translation (AT), speech processing (SP) and information retrieval (IR). We find that results with respect to cross-lingual text retrieval are quite good relative to the task of cross-lingual speech retrieval. Overall we find that the task of MMCLIR and specifically cross-lingual speech retrieval is quite complex. Further we pinpoint open issues related to handling cross-lingual audio and text retrieval for low resource languages that need to be addressed in future research.

pdf abs
AMEX AI-Labs: An Investigative Study on Extractive Summarization of Financial Documents
Piyush Arora | Priya Radhakrishnan
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

We describe the work carried out by AMEX AI-LABS on an extractive summarization benchmark task focused on Financial Narratives Summarization (FNS). This task focuses on summarizing annual financial reports which poses two main challenges as compared to typical news document summarization tasks : i) annual reports are more lengthier (average length about 80 pages) as compared to typical news documents, and ii) annual reports are more loosely structured e.g. comprising of tables, charts, textual data and images, which makes it challenging to effectively summarize. To address this summarization task we investigate a range of unsupervised, supervised and ensemble based techniques. We find that ensemble based techniques perform relatively better as compared to using only the unsupervised and supervised based techniques. Our ensemble based model achieved the highest rank of 9 out of 31 systems submitted for the benchmark task based on Rouge-L evaluation metric.

2016

pdf
DCU-SEManiacs at SemEval-2016 Task 1: Synthetic Paragram Embeddings for Semantic Textual Similarity
Chris Hokamp | Piyush Arora
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf
DCU: Using Distributional Semantics and Domain Adaptation for the Semantic Textual Similarity SemEval-2015 Task 2
Piyush Arora | Chris Hokamp | Jennifer Foster | Gareth Jones
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

2012

pdf abs
Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification
Akshat Bakliwal | Piyush Arora | Vasudeva Varma
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

With recent developments in web technologies, percentage web content in Hindi is growing up at a lighting speed. This information can prove to be very useful for researchers, governments and organization to learn what's on public mind, to make sound decisions. In this paper, we present a graph based wordnet expansion method to generate a full (adjective and adverb) subjective lexicon. We used synonym and antonym relations to expand the initial seed lexicon. We show three different evaluation strategies to validate the lexicon. We achieve 70.4% agreement with human annotators and â¼79% accuracy on product review classification. Main contribution of our work 1) Developing a lexicon of adjectives and adverbs with polarity scores using Hindi Wordnet. 2) Developing an annotated corpora of Hindi Product Reviews.

pdf
Entity Centric Opinion Mining from Blogs
Akshat Bakliwal | Piyush Arora | Vasudeva Varma
Proceedings of the 2nd Workshop on Sentiment Analysis where AI meets Psychology