Aaron Miller


2023

pdf
GPT4All: An Ecosystem of Open Source Compressed Language Models
Yuvanesh Anand | Zach Nussbaum | Adam Treat | Aaron Miller | Richard Guo | Benjamin Schmidt | Brandon Duderstadt | Andriy Mulyar
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

Large language models (LLMs) have recently achieved human-level performance on a range of professional and academic benchmarks.The accessibility of these models has lagged behind their performance.State-of-the-art LLMs require costly infrastructure; are only accessible via rate-limited, geo-locked, and censored web interfaces; and lack publicly available code and technical reports.In this paper, we tell the story of GPT4All, a popular open source repository that aims to democratize access to LLMs.We outline the technical details of the original GPT4All model family, as well as the evolution of the GPT4All project from a single model into a fully fledged open source ecosystem.It is our hope that this paper acts as both a technical overview of the original GPT4All models as well as a case study on the subsequent growth of the GPT4All open source ecosystem.

2022

pdf
A Dependency Treebank of Spoken Second Language English
Kristopher Kyle | Masaki Eguchi | Aaron Miller | Theodore Sither
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

In this paper, we introduce a dependency treebank of spoken second language (L2) English that is annotated with part of speech (Penn POS) tags and syntactic dependencies (Universal Dependencies). We then evaluate the degree to which the use of this treebank as training data affects POS and UD annotation accuracy for L1 web texts, L2 written texts, and L2 spoken texts as compared to models trained on L1 texts only.

2019

pdf
Towards Text Processing Pipelines to Identify Adverse Drug Events-related Tweets: University of Michigan @ SMM4H 2019 Task 1
V.G.Vinod Vydiswaran | Grace Ganzel | Bryan Romas | Deahan Yu | Amy Austin | Neha Bhomia | Socheatha Chan | Stephanie Hall | Van Le | Aaron Miller | Olawunmi Oduyebo | Aulia Song | Radhika Sondhi | Danny Teng | Hao Tseng | Kim Vuong | Stephanie Zimmerman
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

We participated in Task 1 of the Social Media Mining for Health Applications (SMM4H) 2019 Shared Tasks on detecting mentions of adverse drug events (ADEs) in tweets. Our approach relied on a text processing pipeline for tweets, and training traditional machine learning and deep learning models. Our submitted runs performed above average for the task.