Mark Hill


2026

Bibliographies are both humanities infrastructure and historic record. To computationally analyse them, however, requires implementing complex digitisation and standardisation decisions. This paper turns to Seyfettin Özege’s Eski Harflerle Basılmış Türkçe Eserler Kataloğu as an example, a scanned set of volumes marked by complex page layouts, degraded typography, irregular entry structures, and historically contingent inconsistencies. With this we present a pipeline that constructs a structured, machine-readable, and analysable dataset out of the 27,000 entries with computer vision, OCR, large and visual language models, sequence-based validation, and custom review tools. This process captures 97.8% of records, with remaining cases capable of being addressed by targeted review. This process demonstrates that combining LLMs with interpretable, review-centric pipelines, offers an appropriate approach for historically complex bibliographic sources.

2025

This paper examines how emotional responses to football matches influence online discourse across digital spaces on Reddit. By analysing millions of posts from dozens of subreddits, it demonstrates that real-world events trigger sentiment shifts that move across communities. It shows that negative sentiment correlates with problematic language; match outcomes directly influence sentiment and posting habits; sentiment can transfer to unrelated communities; and offers insights into the content of this shifting discourse. These findings reveal how digital spaces function not as isolated environments, but as interconnected emotional ecosystems vulnerable to cross-domain contagion triggered by real-world events, contributing to our understanding of the propagation of online toxicity. While football is used as a case-study to computationally measure affective causes and movements, these patterns have implications for understanding online communities broadly.