Laura Manrique-Gómez
Also published as:
Laura Manrique-Gomez
2025
pdf
bib
abs
Historical Ink: Exploring Large Language Models for Irony Detection in 19th-Century Spanish
Kevin Cohen
|
Laura Manrique-Gómez
|
Ruben Manrique
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
This study explores the use of large language models (LLMs) to enhance datasets and improve irony detection in 19th-century Latin American newspapers. Two strategies were employed to evaluate the efficacy of BERT and GPT models in capturing the subtle nuances nature of irony, through both multi-class and binary classification tasks. First, we implemented dataset enhancements focused on enriching emotional and contextual cues; however, these showed limited impact on historical language analysis. The second strategy, a semi-automated annotation process, effectively addressed class imbalance and augmented the dataset with high-quality annotations. Despite the challenges posed by the complexity of irony, this work contributes to the advancement of sentiment analysis through two key contributions: introducing a new historical Spanish dataset tagged for sentiment analysis and irony detection, and proposing a semi-automated annotation methodology where human expertise is crucial for refining LLMs results, enriched by incorporating historical and cultural contexts as core features.
2024
pdf
bib
Historical Ink: Semantic Shift Detection for 19th Century Spanish
Tony Montes
|
Laura Manrique-Gómez
|
Rubén Manrique
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change
pdf
bib
abs
Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction
Laura Manrique-Gomez
|
Tony Montes
|
Arturo Rodriguez Herrera
|
Ruben Manrique
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
This paper presents two significant contributions: First, it introduces a novel dataset of 19th-century Latin American newspaper texts, addressing a critical gap in specialized corpora for historical and linguistic analysis in this region. Second, it develops a flexible framework that utilizes a Large Language Model for OCR error correction and linguistic surface form detection in digitized corpora. This semi-automated framework is adaptable to various contexts and datasets and is applied to the newly created dataset.