William J. B. Mattingly

Also published as: William J.B. Mattingly


2024

pdf
Creating a Typology of Places to Annotate Holocaust Testimonies Through Machine Learning
Christine Liu | William J.B. Mattingly
Proceedings of the First Workshop on Holocaust Testimonies as Language Resources (HTRes) @ LREC-COLING 2024

The Holocaust was not only experienced in iconic places like Auschwitz or the Warsaw ghetto. Ordinary places, such as city streets, forests, hills, and homes, were transformed by occupation and systematic violence. While most of these places are unnamed and locationally ambiguous, their omnipresence throughout post-war testimonies from witnesses and survivors of the Holocaust emphasize their undeniable importance. This paper shares a methodology for developing a typology of places in order to annotate both named and unnamed places within interview transcripts from the United States Holocaust Memorial Museum (USHMM) through a machine learning model. The approach underscores the benefits of hybrid analysis through both automated extraction and manual review to create distinct categories of places. This paper also reviews how testimony transcripts were converted into structured data for annotation and previews ongoing work to design a search engine for users to dynamically query this place-based approach to studying the Holocaust.

2021

pdf
The Classical Language Toolkit: An NLP Framework for Pre-Modern Languages
Kyle P. Johnson | Patrick J. Burns | John Stewart | Todd Cook | Clément Besnier | William J. B. Mattingly
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations

This paper announces version 1.0 of the Classical Language Toolkit (CLTK), an NLP framework for pre-modern languages. The vast majority of NLP, its algorithms and software, is created with assumptions particular to living languages, thus neglecting certain important characteristics of largely non-spoken historical languages. Further, scholars of pre-modern languages often have different goals than those of living-language researchers. To fill this void, the CLTK adapts ideas from several leading NLP frameworks to create a novel software architecture that satisfies the unique needs of pre-modern languages and their researchers. Its centerpiece is a modular processing pipeline that balances the competing demands of algorithmic diversity with pre-configured defaults. The CLTK currently provides pipelines, including models, for almost 20 languages.