Dan Flickinger

Also published as: Daniel P. Flickinger, D. Flickenger, Daniel Flickinger


2025

This study provides the first comprehensive comparison of New York Times-style text generated by six large language models against real, human-authored NYT writing. The comparison is based on a formal syntactic theory. We use Head-driven Phrase Structure Grammar (HPSG) to analyze the grammatical structure of the texts. We then investigate and illustrate the differences in the distributions of HPSG grammar types, revealing systematic distinctions between human and LLM-generated writing. These findings contribute to a deeper understanding of the syntactic behavior of LLMs as well as humans, within the NYT genre.

2019

The English Resource Grammar (ERG) is a broad-coverage computational grammar of English that outputs underspecified logical-form representations of meaning in a framework dubbed English Resource Semantics (ERS). Two of the target representations in the the 2019 Shared Task on Cross-Framework Meaning Representation Parsing (MRP 2019) derive graph-based simplifications of ERS, viz. Elementary Dependency Structures (EDS) and DELPH-IN MRS Bi-Lexical Dependencies (DM). As a point of reference outside the official MRP competition, we parsed the evaluation strings using the ERG and converted the resulting meaning representations to EDS and DM. These graphs yield higher evaluation scores than the purely data-driven parsers in the actual shared task, suggesting that the general-purpose linguistic knowledge about English grammar encoded in the ERG can add value when parsing into these meaning representations.

2016

We announce a new language resource for research on semantic parsing, a large, carefully curated collection of semantic dependency graphs representing multiple linguistic traditions. This resource is called SDP~2016 and provides an update and extension to previous versions used as Semantic Dependency Parsing target representations in the 2014 and 2015 Semantic Evaluation Exercises. For a common core of English text, this third edition comprises semantic dependency graphs from four distinct frameworks, packaged in a unified abstract format and aligned at the sentence and token levels. SDP 2016 is the first general release of this resource and available for licensing from the Linguistic Data Consortium in May 2016. The data is accompanied by an open-source SDP utility toolkit and system results from previous contrastive parsing evaluations against these target representations.

2015

2014

We motivate and describe the design and development of an emerging encyclopedia of compositional semantics, pursuing three objectives. We first seek to compile a comprehensive catalogue of interoperable semantic analyses, i.e., a precise characterization of meaning representations for a broad range of common semantic phenomena. Second, we operationalize the discovery of semantic phenomena and their definition in terms of what we call their semantic fingerprint, a formal account of the building blocks of meaning representation involved and their configuration. Third, we ground our work in a carefully constructed semantic test suite of minimal exemplars for each phenomenon, along with a ‘target’ fingerprint that enables automated regression testing. We work towards these objectives by codifying and documenting the body of knowledge that has been constructed in a long-term collaborative effort, the development of the LinGO English Resource Grammar. Documentation of its semantic interface is a prerequisite to use by non-experts of the grammar and the analyses it produces, but this effort also advances our own understanding of relevant interactions among phenomena, as well as of areas for future work in the grammar.

2013

2012

We present the WeSearch Data Collection (WDC)―a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data selection and extraction process, with a focus on the extraction of linguistic content from different sources. We present the format of syntacto-semantic annotations found in this resource and present initial parsing results for these data, as well as some reflections following a first round of treebanking.

2011

2010

WikiWoods is an ongoing initiative to provide rich syntacto-semantic annotations for English Wikipedia. We sketch an automated processing pipeline to extract relevant textual content from Wikipedia sources, segment documents into sentence-like units, parse and disambiguate using a broad-coverage precision grammar, and support the export of syntactic and semantic information in various formats. The full parsed corpus is accompanied by a subset of Wikipedia articles for which gold-standard annotations in the same format were produced manually. This subset was selected to represent a coherent domain, Wikipedia entries on the broad topic of Natural Language Processing.

2008

Large-scale grammar-based parsing systems nowadays increasingly rely on independently developed, more specialized components for pre-processing their input. However, different tools make conflicting assumptions about very basic properties such as tokenization. To make linguistic annotation gathered in pre-processing available to “deep” parsing, a hybrid NLP system needs to establish a coherent mapping between the two universes. Our basic assumption is that tokens are best described by attribute value matrices (AVMs) that may be arbitrarily complex. We propose a powerful resource-sensitive rewrite formalism, “chart mapping”, that allows us to mediate between the token descriptions delivered by shallow pre-processing components and the input expected by the grammar. We furthermore propose a novel way of unknown word treatment where all generic lexical entries are instantiated that are licensed by a particular token AVM. Again, chart mapping is used to give the grammar writer full control as to which items (e.g. native vs. generic lexical items) enter syntactic parsing. We discuss several further uses of the original idea and report on early experiences with the new machinery.

2007

2006

2005

In the LOGON machine translation system where semantic transfer using Minimal Recursion Semantics is being developed in conjunction with two existing broad-coverage grammars of Norwegian and English, we motivate the use of a grammar-specific semantic interface (SEM-I) to facilitate the construction and maintenance of a scalable translation engine. The SEM-I is a theoretically grounded component of each grammar, capturing several classes of lexical regularities while also serving the crucial engineering function of supplying a reliable and complete specification of the elementary predications the grammar can realize. We make extensive use of underspecification and type hierarchies to maximize generality and precision.

2004

2002

2001

2000

1995

1992

1991

1985

1984

Search
Fix author