Avinesh PVS


2018

pdf
A Retrospective Analysis of the Fake News Challenge Stance-Detection Task
Andreas Hanselowski | Avinesh PVS | Benjamin Schiller | Felix Caspelherr | Debanjan Chaudhuri | Christian M. Meyer | Iryna Gurevych
Proceedings of the 27th International Conference on Computational Linguistics

The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance classification task as a crucial first step towards detecting fake news. To date, there is no in-depth analysis paper to critically discuss FNC-1’s experimental setup, reproduce the results, and draw conclusions for next-generation stance classification methods. In this paper, we provide such an in-depth analysis for the three top-performing systems. We first find that FNC-1’s proposed evaluation metric favors the majority class, which can be easily classified, and thus overestimates the true discriminative power of the methods. Therefore, we propose a new F1-based metric yielding a changed system ranking. Next, we compare the features and architectures used, which leads to a novel feature-rich stacked LSTM model that performs on par with the best systems, but is superior in predicting minority classes. To understand the methods’ ability to generalize, we derive a new dataset and perform both in-domain and cross-domain experiments. Our qualitative and quantitative study helps interpreting the original FNC-1 scores and understand which features help improving performance and why. Our new dataset and all source code used during the reproduction study are publicly available for future research.

2014

pdf
ThinkMiners: Disorder Recognition using Conditional Random Fields and Distributional Semantics
Ankur Parikh | Avinesh PVS | Joy Mustafi | Lalit Agarwalla | Ashish Mungi
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2011

pdf
Transferring Syntactic Relations from English to Hindi Using Alignments on Local Word Groups
Aswarth Dara | Prashanth Mannem | Hemanth Sagar Bayyarapu | Avinesh PVS
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
A Data Mining Approach to Learn Reorder Rules for SMT
Avinesh PVS
Proceedings of the NAACL HLT 2010 Student Research Workshop

pdf
Phrase-Based Transliteration with Simple Heuristics
Avinesh PVS | Ankur Parikh
Proceedings of the 2010 Named Entities Workshop

pdf
Phrase Based Decoding using a Discriminative Model
Prasanth Kolachina | Sriram Venkatapathy | Srinivas Bangalore | Sudheer Kolachina | Avinesh PVS
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf
A Corpus Factory for Many Languages
Adam Kilgarriff | Siva Reddy | Jan Pomikálek | Avinesh PVS
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

For many languages there are no large, general-language corpora available. Until the web, all but the institutions could do little but shake their heads in dismay as corpus-building was long, slow and expensive. But with the advent of the Web it can be highly automated and thereby fast and inexpensive. We have developed a ‘corpus factory’ where we build large corpora. In this paper we describe the method we use, and how it has worked, and how various problems were solved, for eight languages: Dutch, Hindi, Indonesian, Norwegian, Swedish, Telugu, Thai and Vietnamese. We use the BootCaT method: we take a set of 'seed words' for the language from Wikipedia. Then, several hundred times over, we * randomly select three or four of the seed words * send as a query to Google or Yahoo or Bing, which returns a 'search hits' page * gather the pages that Google or Yahoo point to and save the text. This forms the corpus, which we then * 'clean' (to remove navigation bars, advertisements etc) * remove duplicates * tokenise and (if tools are available) lemmatise and part-of-speech tag * load into our corpus query tool, the Sketch Engine The corpora we have developed are available for use in the Sketch Engine corpus query tool.