Svetlana Churina

2026

Disentangling Codemixing in Chats: The NUS ABC Codemixed Corpus
Svetlana Churina | Akshat Gupta | Nur Insyirah Binte Imam Mujtahid | Kokil Jaidka
Findings of the Association for Computational Linguistics: ACL 2026

Code-mixing involves the seamless integration of linguistic elements from multiple languages within a single discourse, reflecting natural multilingual communication patterns. Despite its prominence in informal interactions such as social media, chat messages and instant-messaging exchanges, there has been a lack of publicly available corpora that are author-labeled and suitable for modeling human conversations and relationships. This study introduces the first labeled and general-purpose corpus for understanding code-mixing in context while maintaining rigorous privacy and ethical standards. It includes over 355,641 messages spanning various code-mixing patterns, with a primary focus on English, Mandarin, and other languages. We expect the Codemix Corpus to serve as a foundational dataset for research in computational linguistics, sociolinguistics, and NLP applications.

2024

pdf bib abs

WASSA 2024 Shared Task: Enhancing Emotional Intelligence with Prompts
Svetlana Churina | Preetika Verma | Suchismita Tripathy
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

This paper describes the system for the last-min-submittion team in WASSA-2024 Shared Task 1:Empathy Detection and Emotion Classification. This task aims at developing models which can predict the empathy, emotion, and emotional polarity. This system achieved relatively goodresults on the competition’s official leaderboard.The code of this system is available here.

pdf bib abs

Improving Evidence Retrieval on Claim Verification Pipeline through Question Enrichment
Svetlana Churina | Anab Maulana Barik | Saisamarth Rajesh Phaye
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)

The AVeriTeC shared task introduces a new real-word claim verification dataset, where a system is tasked to verify a real-world claim based on the evidence found in the internet.In this paper, we proposed a claim verification pipeline called QueenVer which consists of 2 modules, Evidence Retrieval and Claim Verification.Our pipeline collects pairs of <Question, Answer> as the evidence. Recognizing the pivotal role of question quality in the evidence efficacy, we proposed question enrichment to enhance the retrieved evidence. Specifically, we adopt three different Question Generation (QG) technique, muti-hop, single-hop, and Fact-checker style. For the claim verification module, we integrate an ensemble of multiple state-of-the-art LLM to enhance its robustness.Experiments show that QueenVC achieves 0.41, 0.29, and 0.42 on Q, Q+A, and AVeriTeC scores.

Co-authors

Suchismita Tripathy 1

Preetika Verma 1

Venues

Fix author