Gianluca Demartini
2025
Enhancing Criminal Investigation Analysis with Summarization and Memory-based Retrieval-Augmented Generation: A Comprehensive Evaluation of Real Case Data
Mads Skipanes
|
Tollef Emil Jørgensen
|
Kyle Porter
|
Gianluca Demartini
|
Sule Yildirim Yayilgan
Proceedings of the 31st International Conference on Computational Linguistics
This study introduces KriRAG, a novel Retrieval-Augmented Generation (RAG) architecture designed to assist criminal investigators in analyzing information and overcoming the challenge of information overload. KriRAG structures and summarizes extensive document collections based on existing investigative queries, providing relevant document references and detailed answers for each query. Working with unstructured data from two homicide case files comprising approximately 3,700 documents and 13,000 pages, a comprehensive evaluation methodology is established, incorporating semantic retrieval, scoring, reasoning, and query response accuracy. The system’s outputs are evaluated against queries and answers provided by criminal investigators, demonstrating promising performance with 97.5% accuracy in relevance assessment and 77.5% accuracy for query responses. These findings provide a rigorous foundation for other query-oriented and open-ended retrieval applications. KriRAG is designed to run offline on limited hardware, ensuring sensitive data handling and on-device availability.
Personas with Attitudes: Controlling LLMs for Diverse Data Annotation
Leon Fröhling
|
Gianluca Demartini
|
Dennis Assenmacher
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
We present a novel approach for enhancing diversity and control in data annotation tasks by personalizing large language models (LLMs). We investigate the impact of injecting diverse persona descriptions into LLM prompts across two studies, exploring whether personas increase annotation diversity and whether the impacts of individual personas on the resulting annotations are consistent and controllable. Our results indicate that persona-prompted LLMs generate more diverse annotations than LLMs prompted without personas, and that the effects of personas on LLM annotations align with subjective differences in human annotations. These effects are both controllable and repeatable, making our approach a valuable tool for enhancing data annotation in subjective NLP tasks such as toxicity detection.
2022
Automatic Identification of 5C Vaccine Behaviour on Social Media
Ajay Hemanth Sampath Kumar
|
Aminath Shausan
|
Gianluca Demartini
|
Afshin Rahimi
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)
Monitoring vaccine behaviour through social media can guide health policy. We present a new dataset of 9471 tweets posted in Australia from 2020 to 2022, annotated with sentiment toward vaccines and also 5C, the five types of behaviour toward vaccines, a scheme commonly used in health psychology literature. We benchmark our dataset using BERT and Gradient Boosting Machine and show that jointly training both sentiment and 5C tasks (F1=48) outperforms individual training (F1=39) in this highly imbalanced data. Our sentiment analysis indicates close correlation between the sentiments and prominent events during the pandemic. We hope that our dataset and benchmark models will inform further work in online monitoring of vaccine behaviour. The dataset and benchmark methods are accessible online.
Search
Fix author
Co-authors
- Dennis Assenmacher 1
- Leon Fröhling 1
- Tollef Emil Jørgensen 1
- Kyle Porter 1
- Afshin Rahimi 1
- show all...