Akash Anil

2026

Annotating Indian Regional Biases using Large Language Models: Evaluation and Analysis
Debasmita Panda | Akash Anil | Neelesh Kumar Shukla
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)

Social biases based on regional identity (or regional bias) are often observed in Indian contexts on major online social networks and require critical attention. However, due to large linguistic and cultural diversity, high annotation costs, and inherent human biases, very little annotated data exists on regional biases in the Indian context. Recently, Large Language Models (LLMs) have garnered attention for the automatic annotation of text. However, such annotation efforts are largely limited to English texts, and LLMs often perform poorly when applied to low-resource languages. Therefore, this paper focuses on understanding the capabilities and challenges of popular open-source LLMs in annotating Indian regional biases. We utilize the recently proposed IndRegBias dataset, which consists of Indian regionally biased social media comments in both English and code-mixed formats. First, we assess the annotation capabilities of LLMs in a zero-shot setting and critically analyze their performance across different writing styles, including code-mixing, transliteration, and English. We find that the majority of LLMs exhibit low agreement with human annotations (measured using Cohen’s kappa). Consequently, we extend our study by fine-tuning the models using 50% of the data and evaluating them on the remaining 50%. We observe a significant improvement in annotation agreement (kappa) for all the LLMs. To further assess the capabilities of the fine-tuned models, we evaluate them on 500 newly collected social media comments discussing regional issues in India. The results show that most fine-tuned LLMs outperform their zero-shot counterparts when annotating these new comments.

2024

pdf bib abs

Inductive Knowledge Graph Completion with GNNs and Rules: An Analysis
Akash Anil | Victor Gutierrez-Basulto | Yazmin Ibanez-Garcia | Steven Schockaert
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The task of inductive knowledge graph completion requires models to learn inference patterns from a training graph, which can then be used to make predictions on a disjoint test graph. Rule-based methods seem like a natural fit for this task, but in practice they significantly underperform state-of-the-art methods based on Graph Neural Networks (GNNs), such as NBFNet. We hypothesise that the underperformance of rule-based methods is due to two factors: (i) implausible entities are not ranked at all and (ii) only the most informative path is taken into account when determining the confidence in a given link prediction answer. To analyse the impact of these factors, we study a number of variants of a rule-based approach, which are specifically aimed at addressing the aforementioned issues. We find that the resulting models can achieve a performance which is close to that of NBFNet. Crucially, the considered variants only use a small fraction of the evidence that NBFNet relies on, which means that they largely keep the interpretability advantage of rule-based methods. Moreover, we show that a further variant, which does look at the full KG, consistently outperforms NBFNet.

Co-authors

Venues

Fix author