Anshu Kiran Sharma


2026

Not all events in a narrative are created equal: some events are more important than others. Kernel events, a concept introduced in the field of narratology, are causally linked events that move the narrative forward, and cannot be removed without breaking the narrative’s logical coherence. While event detection and extraction tasks have been widely studied in natural language processing and information retrieval fields, the idea of kernel events has been largely unexplored. In this work, we introduce the first corpus and model for kernel event detection. Our contributions include: the refinement of the kernel event concept captured in detailed annotation guidelines grounded in narratological principles; an annotation study yielding a gold-standard dataset of kernel events in narrative texts; and a first-of-its-kind kernel event detection system. Annotation achieved an inter-annotator agreement of 0.61 Kappa, underscoring the reliability of the guidelines. Using these data, we trained several models in both fine-tuned and generative modes for kernel event detection, with a LoRA fine-tuned Llama3 achieving an F1 of 0.695. This work establishes a benchmark for kernel event detection, with potential applications in summarization, narrative similarity detection, and narrative understanding. We release our code and data for the benefit of other researchers.
Job applicants are increasingly turning to generative AI to create or enhance their resumes, leading to challenges in fairness, integrity, and efficiency of modern recruitment processes. We present the first curated corpus of resumes annotated as to whether they are authentic, AI-enhanced, or fully AI-generated. The corpus is balanced across the three classes, comprising 420 resumes spanning five job descriptions in the Information Technology (IT) sector, with the authentic resumes anonymized. We establish strong baselines for this task using traditional and neural supervised machine learning approaches, including Logistic Regression, SVM, Random Forest, XGBoost, BERT, and Longformer. For the featurized approaches, we pair sparse TF-IDF (word/character n-grams) with style features capturing length, punctuation, casing, contractions, lexical diversity (type-token ratio [TTR], number of hapax legomena), n-gram uniqueness, readability indices, and sentiment. Our analysis reveals systematic differences between the classes: AI-generated text features shorter, more uniform sentences, and fewer contractions; AI-enhanced text has the highest uniqueness and TTR; and authentic text has the widest variance across all features. XGBoost is the best performing method, achieving 95.29% accuracy and an F1 of 0.953. We make the corpus available for other researchers to build upon our work. We also benchmark two leading off-the-shelf AI–text detectors on our 420-resume corpus. Despite strong reports in other domains, Originality attains only 55.7% accuracy overall (71/140 authentic, 81/140 AI-generated, 82/140 AI-enhanced correct), and Writer attains 25.0%, with the largest failures on AI-enhanced resumes, highlighting domain shift and cautioning against uncalibrated deployment.