Despite comprehensive food safety regulations worldwide, violations continue to pose significant public health challenges. This paper presents an LLM-driven pipeline for analyzing legal texts to identify structural and procedural gaps in food safety enforcement. We develop an end-to-end system that leverages Large Language Models to extract structured entities from legal judgments, construct statute-and-provision-level knowledge graphs, and perform semantic clustering of cases. Applying our approach to 782 Indian food safety violation cases filed between 2022-2024, we uncover critical insights: 96% of cases were filed by individuals and organizations against state authorities, with 60% resulting in decisions favoring appellants. Through automated clustering and analysis, we identify major procedural lapses including unclear jurisdictional boundaries between enforcement agencies, insufficient evidence collection, and ambiguous penalty guidelines. Our findings reveal concrete weaknesses in current enforcement practices and demonstrate the practical value of LLMs for legal analysis at scale.
Eating disorders are a global health concern as they manifest in increasing numbers across all sections of society. Social network platforms have emerged as a dependable source of information about the disease, its effect, and its prevalence among different sections. This work lays the foundation for large-scale analysis of social media data using large language models (LLMs). We show that using LLMs can drastically reduce the time and resource requirements for garnering insights from large data repositories. With respect to ED, this work focuses on understanding its psychological impacts on both patients and those who live in their proximity. Social scientists can utilize the proposed approach to design more focused studies with better representative groups.
Topic modeling has emerged as a dominant method for exploring large document collections. Recent approaches to topic modeling use large contextualized language models and variational autoencoders. In this paper, we propose a negative sampling mechanism for a contextualized topic model to improve the quality of the generated topics. In particular, during model training, we perturb the generated document-topic vector and use a triplet loss to encourage the document reconstructed from the correct document-topic vector to be similar to the input document and dissimilar to the document reconstructed from the perturbed vector. Experiments for different topic counts on three publicly available benchmark datasets show that in most cases, our approach leads to an increase in topic coherence over that of the baselines. Our model also achieves very high topic diversity.
Keyphrases in a research paper succinctly capture the primary content of the paper and also assist in indexing the paper at a concept level. Given the huge rate at which scientific papers are published today, it is important to have effective ways of automatically extracting keyphrases from a research paper. In this paper, we present a novel method, Syntax and Semantics Aware Keyphrase Extraction (SaSAKE), to extract keyphrases from research papers. It uses a transformer architecture, stacking up sentence encoders to incorporate sequential information, and graph encoders to incorporate syntactic and semantic dependency graph information. Incorporation of these dependency graphs helps to alleviate long-range dependency problems and identify the boundaries of multi-word keyphrases effectively. Experimental results on three benchmark datasets show that our proposed method SaSAKE achieves state-of-the-art performance in keyphrase extraction from scientific papers.