Matthew Fort


2023

pdf
Bigfoot in Big Tech: Detecting Out of Domain Conspiracy Theories
Matthew Fort | Zuoyu Tian | Elizabeth Gabel | Nina Georgiades | Noah Sauer | Daniel Dakota | Sandra Kübler
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

We investigate approaches to classifying texts into either conspiracy theory or mainstream using the Language Of Conspiracy (LOCO) corpus. Since conspiracy theories are not monolithic constructs, we need to identify approaches that robustly work in an out-of- domain setting (i.e., across conspiracy topics). We investigate whether optimal in-domain set- tings can be transferred to out-of-domain set- tings, and we investigate different methods for bleaching to steer classifiers away from words typical for an individual conspiracy theory. We find that BART works better than an SVM, that we can successfully classify out-of-domain, but there are no clear trends in how to choose the best source training domains. Addition- ally, bleaching only topic words works better than bleaching all content words or completely delexicalizing texts.

2022

pdf
Event Sequencing Annotation with TIE-ML
Damir Cavar | Ali Aljubailan | Ludovic Mompelat | Yuna Won | Billy Dickson | Matthew Fort | Andrew Davis | Soyoung Kim
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022

TIE-ML (Temporal Information Event Markup Language) first proposed by Cavar et al. (2021) provides a radically simplified temporal annotation schema for event sequencing and clause level temporal properties even in complex sentences. TIE-ML facilitates rapid annotation of essential tense features at the clause level by labeling simple or periphrastic tense properties, as well as scope relations between clauses, and temporal interpretation at the sentence level. This paper presents the first annotation samples and empirical results. The application of the TIE-ML strategy on the sentences in the Penn Treebank (Marcus et al., 1993) and other non-English language data is discussed in detail. The motivation, insights, and future directions for TIE-ML are discussed, too. The aim is to develop a more efficient annotation strategy and a formalism for clause-level tense and aspect labeling, event sequencing, and tense scope relations that boosts the productivity of tense and event-level corpus annotation. The central goal is to facilitate the production of large data sets for machine learning and quantitative linguistic studies of intra- and cross-linguistic semantic properties of temporal and event logic.