Yash Kankanampati


2025

pdf bib
SHROOM-CAP: Shared Task on Hallucinations and Related Observable Overgeneration Mistakes in Crosslingual Analyses of Publications
Aman Sinha | Federica Gamba | Raúl Vázquez | Timothee Mickus | Ahana Chattopadhyay | Laura Zanella | Binesh Arakkal Remesh | Yash Kankanampati | Aryan Chandramania | Rohit Agarwal
Proceedings of the 1st Workshop on Confabulation, Hallucinations and Overgeneration in Multilingual and Practical Settings (CHOMPS 2025)

This paper presents an overview of the SHROOM-CAP Shared Task, which focuses on detecting hallucinations and over-generation errors in cross-lingual analyses of scientific publications. SHROOM-CAP covers nine languages: five high-resource (English, French, Hindi, Italian, and Spanish) and four low-resource (Bengali, Gujarati, Malayalam, and Telugu). The task frames hallucination detection as a binary classification problem, where participants must predict whether a given text contains factual inaccuracies and fluency mistakes. We received 1,571 submissions from 5 participating teams during the test phase over the nine languages. In the paper, we present an analysis of the evaluated systems to assess their performance on the hallucination detection task across languages. Our findings reveal a disparity in system performance between high-resource and low-resource languages. Furthermore, we observe that factuality and fluency tend to be closely aligned in high-resource languages, whereas this correlation is less evident in low-resource languages. Overall, SHROOM-CAP underlines that hallucination detection remains a challenging open problem, particularly in low-resource and domain-specific settings.

2023

pdf bib
Remember what you did so you know what to do next
Manuel Ciosici | Alex Hedges | Yash Kankanampati | Justin Martin | Marjorie Freedman | Ralph Weischedel
Findings of the Association for Computational Linguistics: EMNLP 2023

We explore using the 6B parameter GPT-J language model to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments and for which previously published empirical work has shown large language models (LLM)s to be a poor fit (Wang et al., 2022). Using the Markov assumption, the LLM outperforms the state-of-the-art based on reinforcement learning by a factor of 1.4. When we fill the LLM’s input buffer with as many prior steps as will fit, improvement rises to 3.3x. Even when training on only 6.5% of the training data, we observe a 2.3x improvement over the state-of-the-art. Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues.

2021

pdf bib
Machine-Assisted Script Curation
Manuel Ciosici | Joseph Cummings | Mitchell DeHaven | Alex Hedges | Yash Kankanampati | Dong-Ho Lee | Ralph Weischedel | Marjorie Freedman
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

We describe Machine-Aided Script Curator (MASC), a system for human-machine collaborative script authoring. Scripts produced with MASC include (1) English descriptions of sub-events that comprise a larger, complex event; (2) event types for each of those events; (3) a record of entities expected to participate in multiple sub-events; and (4) temporal sequencing between the sub-events. MASC automates portions of the script creation process with suggestions for event types, links to Wikidata, and sub-events that may have been forgotten. We illustrate how these automations are useful to the script writer with a few case-study scripts.

2020

pdf bib
Multitask Easy-First Dependency Parsing: Exploiting Complementarities of Different Dependency Representations
Yash Kankanampati | Joseph Le Roux | Nadi Tomeh | Dima Taji | Nizar Habash
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we present a parsing model for projective dependency trees which takes advantage of the existence of complementary dependency annotations which is the case in Arabic, with the availability of CATiB and UD treebanks. Our system performs syntactic parsing according to both annotation types jointly as a sequence of arc-creating operations, and partially created trees for one annotation are also available to the other as features for the score function. This method gives error reduction of 9.9% on CATiB and 6.1% on UD compared to a strong baseline, and ablation tests show that the main contribution of this reduction is given by sharing tree representation between tasks, and not simply sharing BiLSTM layers as is often performed in NLP multitask systems.