Clayton T. Morrison

Also published as: Clayton T Morrison

2025

pdf bib abs
Variable Extraction for Model Recovery in Scientific Literature
Chunwei Liu | Enrique Noriega-Atala | Adarsh Pyarelal | Clayton T Morrison | Mike Cafarella
Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities

Due to the increasing productivity in the scientific community, it is difficult to keep up with the literature without the assistance of AI methods. This paper evaluates various methods for extracting mathematical model variables from epidemiological studies, such as ‘infection rate (𝛼),” ‘recovery rate (𝛾),” and ‘mortality rate (𝜇).” Variable extraction appears to be a basic task, but plays a pivotal role in recovering models from scientific literature. Once extracted, we can use these variables for automatic mathematical modeling, simulation, and replication of published results. We also introduce a benchmark dataset comprising manually-annotated variable descriptions and variable values extracted from scientific papers. Our analysis shows that LLM-based solutions perform the best. Despite the incremental benefits of combining rule-based extraction outputs with LLMs, the leap in performance attributed to the transfer-learning and instruction-tuning capabilities of LLMs themselves is far more significant. This investigation demonstrates the potential of LLMs to enhance automatic comprehension of scientific artifacts and for automatic model recovery and simulation.

pdf bib abs
A Framework to Retrieve Relevant Laws for Will Execution
Md Asiful Islam | Alice Saebom Kwak | Derek Bambauer | Clayton T Morrison | Mihai Surdeanu
Proceedings of the Natural Legal Language Processing Workshop 2025

Wills must comply with jurisdiction-specific statutory provisions to be valid, but retrieving the relevant laws for execution, validation, and probate remains labor-intensive and error-prone. Prior legal information retrieval (LIR) research has addressed contracts, criminal law, and judicial decisions, but wills and probate law remain largely unexplored, with no prior work on retrieving statutes for will validity assessment. We propose a legal information retrieval framework that combines lexical and semantic retrieval in a hybrid pipeline with large language model (LLM) reasoning to retrieve the most relevant provisions for a will statement. Evaluations on annotated will-statement datasets from the U.S. states of Tennessee and Idaho using six LLMs show that our hybrid framework consistently outperforms zero-shot baselines. Notably, when paired with our hybrid retrieval pipeline, GPT-5-mini achieves the largest relative accuracy gains, improving by 41.09 points on the Tennessee and 48.68 points on the Idaho test set. We observed similarly strong improvements across all models and datasets.

2024

pdf bib abs
When and Where Did it Happen? An Encoder-Decoder Model to Identify Scenario Context
Enrique Noriega-Atala | Robert Vacareanu | Salena Torres Ashton | Adarsh Pyarelal | Clayton T Morrison | Mihai Surdeanu
Findings of the Association for Computational Linguistics: EMNLP 2024

We introduce a neural architecture finetuned for the task of scenario context generation: The relevant location and time of an event or entity mentioned in text. Contextualizing information extraction helps to scope the validity of automated finings when aggregating them as knowledge graphs. Our approach uses a high-quality curated dataset of time and location annotations in a corpus of epidemiology papers to train an encoder-decoder architecture. We also explored the use of data augmentation techniques during training. Our findings suggest that a relatively small fine-tuned encoder-decoder model performs better than out-of-the-box LLMs and semantic role labeling parsers to accurate predict the relevant scenario information of a particular entity or event.