Jake Williams


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2019

pdf bib
Latent semantic network induction in the context of linked example senses
Hunter Heidenreich | Jake Williams
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

The Princeton WordNet is a powerful tool for studying language and developing natural language processing algorithms. With significant work developing it further, one line considers its extension through aligning its expert-annotated structure with other lexical resources. In contrast, this work explores a completely data-driven approach to network construction, forming a wordnet using the entirety of the open-source, noisy, user-annotated dictionary, Wiktionary. Comparing baselines to WordNet, we find compelling evidence that our network induction process constructs a network with useful semantic structure. With thousands of semantically-linked examples that demonstrate sense usage from basic lemmas to multiword expressions (MWEs), we believe this work motivates future research.

2017

pdf bib
Boundary-based MWE segmentation with text partitioning
Jake Williams
Proceedings of the 3rd Workshop on Noisy User-generated Text

This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review.

pdf bib
Context-Sensitive Recognition for Emerging and Rare Entities
Jake Williams | Giovanni Santia
Proceedings of the 3rd Workshop on Noisy User-generated Text

This paper is a shared task system description for the 2017 W-NUT shared task on Rare and Emerging Named Entities. Our paper describes the development and application of a novel algorithm for named entity recognition that relies only on the contexts of word forms. A comparison against the other submitted systems is provided.