Gerald Penn


2024

pdf
A Generative Model for Lambek Categorial Sequents
Jinman Zhao | Gerald Penn
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this work, we introduce a generative model, PLC+, for generating Lambek Categorial Grammar(LCG) sequents. We also introduce a simple method to numerically estimate the model’s parameters from an annotated corpus. Then we compare our model with probabilistic context-free grammars (PCFGs) and show that PLC+ simultaneously assigns a higher probability to a common corpus, and has greater coverage.

pdf
LCGbank: A Corpus of Syntactic Analyses Based on Proof Nets
Aditya Bhargava | Timothy A. D. Fowler | Gerald Penn
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In syntactic parsing, *proof nets* are graphical structures that have the advantageous property of invariance to spurious ambiguities. Semantically-equivalent derivations correspond to a single proof net. Recent years have seen fresh interest in statistical syntactic parsing with proof nets, including the development of methods based on neural networks. However, training of statistical parsers requires corpora that provide ground-truth syntactic analyses. Unfortunately, there has been a paucity of corpora in formalisms for which proof nets are applicable, such as Lambek categorial grammar (LCG), a formalism related to combinatory categorial grammar (CCG). To address this, we leverage CCGbank and the relationship between LCG and CCG to develop LCGbank, an English-language corpus of syntactic analyses based on LCG proof nets. In contrast to CCGbank, LCGbank eschews type-changing and uses only categorial rules; the syntactic analyses thus provide fully compositional semantics, exploiting the transparency between syntax and semantics that so characterizes categorial grammars.

2023

pdf
Discourse Information for Document-Level Temporal Dependency Parsing
Jingcheng Niu | Victoria Ng | Erin Rees | Simon De Montigny | Gerald Penn
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)

In this study, we examine the benefits of incorporating discourse information into document-level temporal dependency parsing. Specifically, we evaluate the effectiveness of integrating both high-level discourse profiling information, which describes the discourse function of sentences, and surface-level sentence position information into temporal dependency graph (TDG) parsing. Unexpectedly, our results suggest that simple sentence position information, particularly when encoded using our novel sentence-position embedding method, performs the best, perhaps because it does not rely on noisy model-generated feature inputs. Our proposed system surpasses the current state-of-the-art TDG parsing systems in performance. Furthermore, we aim to broaden the discussion on the relationship between temporal dependency parsing and discourse analysis, given the substantial similarities shared between the two tasks. We argue that discourse analysis results should not be merely regarded as an additional input feature for temporal dependency parsing. Instead, adopting advanced discourse analysis techniques and research insights can lead to more effective and comprehensive approaches to temporal information extraction tasks.

pdf
Decomposed scoring of CCG dependencies
Aditya Bhargava | Gerald Penn
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In statistical parsing with CCG, the standard evaluation method is based on predicate-argument structure and evaluates dependencies labelled in part by lexical categories. When a predicate has multiple argument slots that can be filled, the same lexical category is used for the label of multiple dependencies. In this paper, we show that this evaluation can result in disproportionate penalization of supertagging errors and obfuscate the truly erroneous dependencies. Enabled by the compositional nature of CCG lexical categories, we propose *decomposed scoring* based on subcategorial labels to address this. To evaluate our scoring method, we engage fellow categorial grammar researchers in two English-language judgement tasks: (1) directly ranking the outputs of the standard and experimental scoring methods; and (2) determining which of two sentences has the better parse in cases where the two scoring methods disagree on their ranks. Overall, the judges prefer decomposed scoring in each task; but there is substantial disagreement among the judges in 24% of the given cases, pointing to potential issues with parser evaluations in general.

2022

pdf
Using Roark-Hollingshead Distance to Probe BERT’s Syntactic Competence
Jingcheng Niu | Wenjie Lu | Eric Corlett | Gerald Penn
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Probing BERT’s general ability to reason about syntax is no simple endeavour, primarily because of the uncertainty surrounding how large language models represent syntactic structure. Many prior accounts of BERT’s agility as a syntactic tool (Clark et al., 2013; Lau et al., 2014; Marvin and Linzen, 2018; Chowdhury and Zamparelli, 2018; Warstadt et al., 2019, 2020; Hu et al., 2020) have therefore confined themselves to studying very specific linguistic phenomena, and there has still been no definitive answer as to whether BERT “knows” syntax. The advent of perturbed masking (Wu et al., 2020) would then seem to be significant, because this is a parameter-free probing method that directly samples syntactic trees from BERT’s embeddings. These sampled trees outperform a right-branching baseline, thus providing preliminary evidence that BERT’s syntactic competence bests a simple baseline. This baseline is underwhelming, however, and our reappraisal below suggests that this result, too, is inconclusive. We propose RH Probe, an encoder-decoder probing architecture that operates on two probing tasks. We find strong empirical evidence confirming the existence of important syntactic information in BERT, but this information alone appears not to be enough to reproduce syntax in its entirety. Our probe makes crucial use of a conjecture made by Roark and Holling-shead (2008) that a particular lexical annotation that we shall call RH distance is a sufficient encoding of unlabelled binary syntactic trees, and we prove this conjecture.

pdf
A Taxonomical NLP Blueprint to Support Financial Decision Making through Information-Centred Interactions
Siavash Kazemian | Cosmin Munteanu | Gerald Penn
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Investment management professionals (IMPs) often make decisions after manual analysis of text transcripts of central banks’ conferences or companies’ earning calls. Their current software tools, while interactive, largely leave users unassisted in using these transcripts. A key component to designing speech and NLP techniques for this community is to qualitatively characterize their perceptions of AI as well as their legitimate needs so as to (1) better apply existing NLP methods, (2) direct future research and (3) correct IMPs’ perceptions of what AI is capable of. This paper presents such a study, through a contextual inquiry with eleven IMPs, uncovering their information practices when using such transcripts. We then propose a taxonomy of user requirements and usability criteria to support IMP decision making, and validate the taxonomy through participatory design workshops with four IMPs. Our investigation suggests that: (1) IMPs view visualization methods and natural language processing algorithms primarily as time-saving tools that are incapable of enhancing either discovery or interpretation and (2) their existing software falls well short of the state of the art in both visualization and NLP.

pdf bib
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing
Dmitry Ustalov | Yanjun Gao | Alexander Panchenko | Marco Valentino | Mokanarangan Thayaparan | Thien Huu Nguyen | Gerald Penn | Arti Ramesh | Abhik Jana
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

pdf
Does BERT Rediscover a Classical NLP Pipeline?
Jingcheng Niu | Wenjie Lu | Gerald Penn
Proceedings of the 29th International Conference on Computational Linguistics

Does BERT store surface knowledge in its bottom layers, syntactic knowledge in its middle layers, and semantic knowledge in its upper layers? In re-examining Jawahar et al. (2019) and Tenney et al.’s (2019a) probes into the structure of BERT, we have found that the pipeline-like separation that they asserted lacks conclusive empirical support. BERT’s structure is, however, linguistically founded, although perhaps in a way that is more nuanced than can be explained by layers alone. We introduce a novel probe, called GridLoc, through which we can also take into account token positions, training rounds, and random seeds. Using GridLoc, we are able to detect other, stronger regularities that suggest that pseudo-cognitive appeals to layer depth may not be the preferable mode of explanation for BERT’s inner workings.

2021

pdf bib
Proof Net Structure for Neural Lambek Categorial Parsing
Aditya Bhargava | Gerald Penn
Proceedings of the 17th International Conference on Parsing Technologies and the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies (IWPT 2021)

In this paper, we present the first statistical parser for Lambek categorial grammar (LCG), a grammatical formalism for which the graphical proof method known as *proof nets* is applicable. Our parser incorporates proof net structure and constraints into a system based on self-attention networks via novel model elements. Our experiments on an English LCG corpus show that incorporating term graph structure is helpful to the model, improving both parsing accuracy and coverage. Moreover, we derive novel loss functions by expressing proof net constraints as differentiable functions of our model output, enabling us to train our parser without ground-truth derivations.

pdf
Structural Realization with GGNNs
Jinman Zhao | Gerald Penn | Huan Ling
Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)

In this paper, we define an abstract task called structural realization that generates words given a prefix of words and a partial representation of a parse tree. We also present a method for solving instances of this task using a Gated Graph Neural Network (GGNN). We evaluate it with standard accuracy measures, as well as with respect to perplexity, in which its comparison to previous work on language modelling serves to quantify the information added to a lexical selection task by the presence of syntactic knowledge. That the addition of parse-tree-internal nodes to this neural model should improve the model, with respect both to accuracy and to more conventional measures such as perplexity, may seem unsurprising, but previous attempts have not met with nearly as much success. We have also learned that transverse links through the parse tree compromise the model’s accuracy at generating adjectival and nominal parts of speech.

pdf
Reanalyzing the Most Probable Sentence Problem: A Case Study in Explicating the Role of Entropy in Algorithmic Complexity
Eric Corlett | Gerald Penn
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

When working with problems in natural language processing, we can find ourselves in situations where the traditional measurements of descriptive complexity are ineffective at describing the behaviour of our algorithms. It is easy to see why — the models we use are often general frameworks into which difficult-to-define tasks can be embedded. These frameworks can have more power than we typically use, and so complexity measures such as worst-case running time can drastically overestimate the cost of running our algorithms. In particular, they can make an apparently tractable problem seem NP-complete. Using empirical studies to evaluate performance is a necessary but incomplete method of dealing with this mismatch, since these studies no longer act as a guarantee of good performance. In this paper we use statistical measures such as entropy to give an updated analysis of the complexity of the NP-complete Most Probable Sentence problem for pCFGs, which can then be applied to word sense disambiguation and inference tasks. We can bound both the running time and the error in a simple search algorithm, allowing for a much faster search than the NP-completeness of this problem would suggest.

pdf
The Chinese Remainder Theorem for Compact, Task-Precise, Efficient and Secure Word Embeddings
Patricia Thaine | Gerald Penn
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

The growing availability of powerful mobile devices and other edge devices, together with increasing regulatory and security concerns about the exchange of personal information across networks of these devices has challenged the Computational Linguistics community to develop methods that are at once fast, space-efficient, accurate and amenable to secure encoding schemes such as homomorphic encryption. Inspired by recent work that restricts floating point precision to speed up neural network training in hardware-based SIMD, we have developed a method for compressing word vector embeddings into integers using the Chinese Reminder Theorem that speeds up addition by up to 48.27% and at the same time compresses GloVe word embedding libraries by up to 25.86%. We explore the practicality of this simple approach by investigating the trade-off between precision and performance in two NLP tasks: compositional semantic relatedness and opinion target sentiment classification. We find that in both tasks, lowering floating point number precision results in negligible changes to performance.

pdf
Feature Structures in the Wild: A Case Study in Mixing Traditional Linguistic Knowledge Representation with Neural Language Models
Gerald Penn | Ken Shi
Proceedings of the ESSLLI 2021 Workshop on Computing Semantics with Types, Frames and Related Structures

pdf bib
Statistically Evaluating Social Media Sentiment Trends towards COVID-19 Non-Pharmaceutical Interventions with Event Studies
Jingcheng Niu | Erin Rees | Victoria Ng | Gerald Penn
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

In the midst of a global pandemic, understanding the public’s opinion of their government’s policy-level, non-pharmaceutical interventions (NPIs) is a crucial component of the health-policy-making process. Prior work on CoViD-19 NPI sentiment analysis by the epidemiological community has proceeded without a method for properly attributing sentiment changes to events, an ability to distinguish the influence of various events across time, a coherent model for predicting the public’s opinion of future events of the same sort, nor even a means of conducting significance tests. We argue here that this urgently needed evaluation method does already exist. In the financial sector, event studies of the fluctuations in a publicly traded company’s stock price are commonplace for determining the effects of earnings announcements, product placements, etc. The same method is suitable for analysing temporal sentiment variation in the light of policy-level NPIs. We provide a case study of Twitter sentiment towards policy-level NPIs in Canada. Our results confirm a generally positive connection between the announcements of NPIs and Twitter sentiment, and we document a promising correlation between the results of this study and a public-health survey of popular compliance with NPIs.

pdf bib
A Generative Process for Lambek Categorial Proof Nets
Jinman Zhao | Gerald Penn
Proceedings of the 17th Meeting on the Mathematics of Language

2020

pdf
Grammaticality and Language Modelling
Jingcheng Niu | Gerald Penn
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

Ever since Pereira (2000) provided evidence against Chomsky’s (1957) conjecture that statistical language modelling is incommensurable with the aims of grammaticality prediction as a research enterprise, a new area of research has emerged that regards statistical language models as “psycholinguistic subjects” and probes their ability to acquire syntactic knowledge. The advent of The Corpus of Linguistic Acceptability (CoLA) (Warstadt et al., 2019) has earned a spot on the leaderboard for acceptability judgements, and the polemic between Lau et al. (2017) and Sprouse et al. (2018) has raised fundamental questions about the nature of grammaticality and how acceptability judgements should be elicited. All the while, we are told that neural language models continue to improve. That is not an easy claim to test at present, however, because there is almost no agreement on how to measure their improvement when it comes to grammaticality and acceptability judgements. The GLUE leaderboard bundles CoLA together with a Matthews correlation coefficient (MCC), although probably because CoLA’s seminal publication was using it to compute inter-rater reliabilities. Researchers working in this area have used other accuracy and correlation scores, often driven by a need to reconcile and compare various discrete and continuous variables with each other. The score that we will advocate for in this paper, the point biserial correlation, in fact compares a discrete variable (for us, acceptability judgements) to a continuous variable (for us, neural language model probabilities). The only previous work in this area to choose the PBC that we are aware of is Sprouse et al. (2018a), and that paper actually applied it backwards (with some justification) so that the language model probability was treated as the discrete binary variable by setting a threshold. With the PBC in mind, we will first reappraise some recent work in syntactically targeted linguistic evaluations (Hu et al., 2020), arguing that while their experimental design sets a new high watermark for this topic, their results may not prove what they have claimed. We then turn to the task-independent assessment of language models as grammaticality classifiers. Prior to the introduction of the GLUE leaderboard, the vast majority of this assessment was essentially anecdotal, and we find the use of the MCC in this regard to be problematic. We conduct several studies with PBCs to compare several popular language models. We also study the effects of several variables such as normalization and data homogeneity on PBC.

pdf
Temporal Histories of Epidemic Events (THEE): A Case Study in Temporal Annotation for Public Health
Jingcheng Niu | Victoria Ng | Gerald Penn | Erin E. Rees
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present a new temporal annotation standard, THEE-TimeML, and a corpus TheeBank enabling precise temporal information extraction (TIE) for event-based surveillance (EBS) systems in the public health domain. Current EBS must estimate the occurrence time of each event based on coarse document metadata such as document publication time. Because of the complicated language and narration style of news articles, estimated case outbreak times are often inaccurate or even erroneous. Thus, it is necessary to create annotation standards and corpora to facilitate the development of TIE systems in the public health domain to address this problem. We will discuss the adaptations that have proved necessary for this domain as we present THEE-TimeML and TheeBank. Finally, we document the corpus annotation process, and demonstrate the immediate benefit to public health applications brought by the annotations.

pdf
FAB: The French Absolute Beginner Corpus for Pronunciation Training
Sean Robertson | Cosmin Munteanu | Gerald Penn
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce the French Absolute Beginner (FAB) speech corpus. The corpus is intended for the development and study of Computer-Assisted Pronunciation Training (CAPT) tools for absolute beginner learners. Data were recorded during two experiments focusing on using a CAPT system in paired role-play tasks. The setting grants FAB three distinguishing features from other non-native corpora: the experimental setting is ecologically valid, closing the gap between training and deployment; it features a label set based on teacher feedback, allowing for context-sensitive CAPT; and data have been primarily collected from absolute beginners, a group often ignored. Participants did not read prompts, but instead recalled and modified dialogues that were modelled in videos. Unable to distinguish modelled words solely from viewing videos, speakers often uttered unintelligible or out-of-L2 words. The corpus is split into three partitions: one from an experiment with minimal feedback; another with explicit, word-level feedback; and a third with supplementary read-and-record data. A subset of words in the first partition has been labelled as more or less native, with inter-annotator agreement reported. In the explicit feedback partition, labels are derived from the experiment’s online feedback. The FAB corpus is scheduled to be made freely available by the end of 2020.

pdf
Supertagging with CCG primitives
Aditya Bhargava | Gerald Penn
Proceedings of the 5th Workshop on Representation Learning for NLP

In CCG and other highly lexicalized grammars, supertagging a sentence’s words with their lexical categories is a critical step for efficient parsing. Because of the high degree of lexicalization in these grammars, the lexical categories can be very complex. Existing approaches to supervised CCG supertagging treat the categories as atomic units, even when the categories are not simple; when they encounter words with categories unseen during training, their guesses are accordingly unsophisticated. In this paper, we make use of the primitives and operators that constitute the lexical categories of categorial grammars. Instead of opaque labels, we treat lexical categories themselves as linear sequences. We present an LSTM-based model that replaces standard word-level classification with prediction of a sequence of primitives, similarly to LSTM decoders. Our model obtains state-of-the-art word accuracy for single-task English CCG supertagging, increases parser coverage and F1, and is able to produce novel categories. Analysis shows a synergistic effect between this decomposed view and incorporation of prediction history.

2019

pdf bib
Proceedings of the 16th Meeting on the Mathematics of Language
Philippe de Groote | Frank Drewes | Gerald Penn
Proceedings of the 16th Meeting on the Mathematics of Language

pdf
Rationally Reappraising ATIS-based Dialogue Systems
Jingcheng Niu | Gerald Penn
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The Air Travel Information Service (ATIS) corpus has been the most common benchmark for evaluating Spoken Language Understanding (SLU) tasks for more than three decades since it was released. Recent state-of-the-art neural models have obtained F1-scores near 98% on the task of slot filling. We developed a rule-based grammar for the ATIS domain that achieves a 95.82% F1-score on our evaluation set. In the process, we furthermore discovered numerous shortcomings in the ATIS corpus annotation, which we have fixed. This paper presents a detailed account of these shortcomings, our proposed repairs, our rule-based grammar and the neural slot-filling architectures associated with ATIS. We also rationally reappraise the motivations for choosing a neural architecture in view of this account. Fixing the annotation errors results in a relative error reduction of between 19.4 and 52% across all architectures. We nevertheless argue that neural models must play a different role in ATIS dialogues because of the latter’s lack of variety.

2017

pdf
Vowel and Consonant Classification through Spectral Decomposition
Patricia Thaine | Gerald Penn
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We consider two related problems in this paper. Given an undeciphered alphabetic writing system or mono-alphabetic cipher, determine: (1) which of its letters are vowels and which are consonants; and (2) whether the writing system is a vocalic alphabet or an abjad. We are able to show that a very simple spectral decomposition based on character co-occurrences provides nearly perfect performance with respect to answering both question types.

2016

pdf
Evaluating Sentiment Analysis in the Context of Securities Trading
Siavash Kazemian | Shunan Zhao | Gerald Penn
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2014

pdf
Evaluating Sentiment Analysis Evaluation: A Case Study in Securities Trading
Siavash Kazemian | Shunan Zhao | Gerald Penn
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf
Unsupervised Sentence Enhancement for Automatic Summarization
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf
Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
The mathematics of language learning
András Kornai | Gerald Penn | James Rogers | Anssi Yli-Jyrä
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)

pdf
Why Letter Substitution Puzzles are Not Hard to Solve: A Case Study in Entropy and Probabilistic Search-Complexity
Eric Corlett | Gerald Penn
Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13)

2012

pdf
Flexible Structural Analysis of Near-Meet-Semilattices for Typed Unification-Based Grammar Design
Rouzbeh Farahmand | Gerald Penn
Proceedings of COLING 2012

pdf
On Panini and the Generative Capacity of Contextualized Replacement Systems
Gerald Penn | Paul Kiparsky
Proceedings of COLING 2012: Posters

pdf bib
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
Miriam Butt | Sheelagh Carpendale | Gerald Penn | Jelena Prokić | Michael Cysouw
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

pdf
Ecological Validity and the Evaluation of Speech Summarization Quality
Anthony McCallum | Cosmin Munteanu | Gerald Penn | Xiaodan Zhu
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

pdf
Evaluating Distributional Models of Semantics for Syntactically Invariant Inference
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Unsupervised Detection of Downward-Entailing Operators By Maximizing Classification Certainty
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf
Indexing Spoken Documents with Hierarchical Semantic Structures: Semantic Tree-to-string Alignment Models
Xiaodan Zhu | Colin Cherry | Gerald Penn
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf
Utilizing Extra-Sentential Context for Parsing
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Ron Kaplan | Jill Burstein | Mary Harper | Gerald Penn
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf
Imposing Hierarchical Browsing Structures onto Spoken Documents
Xiaodan Zhu | Colin Cherry | Gerald Penn
Coling 2010: Posters

pdf
The Quantitative Study of Writing Systems
Gerald Penn
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées

pdf
Entity-Based Local Coherence Modelling Using Topological Fields
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Accurate Context-Free Parsing with Combinatory Categorial Grammar
Timothy A. D. Fowler | Gerald Penn
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
An Exact A* Method for Deciphering Letter-Substitution Ciphers
Eric Corlett | Gerald Penn
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
Matthew Skala | Victoria Krakovna | János Kramár | Gerald Penn
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf
Topological Field Parsing of German
Jackie Chi Kit Cheung | Gerald Penn
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
Summarizing multiple spoken documents: finding evidence from untranscribed audio
Xiaodan Zhu | Gerald Penn | Frank Rudzicz
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
Improving Automatic Speech Recognition for Lectures through Transformation-based Rules Learned from Minimal Data
Cosmin Munteanu | Gerald Penn | Xiaodan Zhu
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf
A Critical Reassessment of Evaluation Baselines for Speech Summarization
Gerald Penn | Xiaodan Zhu
Proceedings of ACL-08: HLT

pdf
Interactive Visualization for Computational Linguistics
Christopher Collins | Gerald Penn | Sheelagh Carpendale
Tutorial Abstracts of ACL-08: HLT

pdf bib
Proceedings of the Workshop on Parsing German
Sandra Kübler | Gerald Penn
Proceedings of the Workshop on Parsing German

2006

pdf bib
Control Strategies for Parsing with Freer Word-Order Languages
Gerald Penn | Stefan Banjevic | Michael Demko
Proceedings of the Third Workshop on Constraints and Language Processing

pdf
Quantitative Methods for Classifying Writing Systems
Gerald Penn | Travis Choma
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf
Comparing the roles of textual, acoustic and spoken-language features on spontaneous-conversation summarization
Xiaodan Zhu | Gerald Penn
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2004

pdf
Optimizing Typed Feature Structure Grammar Parsing through Non-Statistical Indexing
Cosmin Munteanu | Gerald Penn
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf
Head-Driven Parsing for Word Lattices
Christopher Collins | Bob Carpenter | Gerald Penn
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf
Balancing Clarity and Efficiency in Typed Feature Logic Through Delaying
Gerald Penn
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf
AVM Description Compilation using Types as Modes
Gerald Penn
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Topological Parsing
Gerald Penn | Mohammad Haji-Abdolhosseini
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf
Book Reviews: Linguistic Evolution through Language Acquisition: Formal and Computational Models edited by Ted Briscoe; Implementing Typed Feature Structure Grammars by Ann Copestake
Michael A. Arbib | Gerald Penn
Computational Linguistics, Volume 29, Number 3, September 2003: Special Issue on the Web as Corpus

pdf
A Tabulation-Based Parsing Method that Reduces Copying
Gerald Penn | Cosmin Munteanu
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf
A Web-based Instructional Platform for Contraint-Based Grammar Formalisms and Parsing
W. Detmar Meurers | Gerald Penn | Frank Richter
Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics

pdf
Generalized Encoding of Description Spaces and its Application to Typed Feature Structures
Gerald Penn
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf
Tractability and Structural Closures in Attribute Logic Type Signatures
Gerald Penn
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf
Book Reviews: The Mathematics of Syntactic Structure: Trees and their Logics
Gerald Penn
Computational Linguistics, Volume 26, Number 2, June 2000

1998

pdf
Parametric Types for Typed Attribute-Value Logic
Gerald Penn
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf
Parametric Types for Typed Attribute-Value Logic
Gerald Penn
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1997

pdf
Head-Driven Generation and Indexing in ALE
Gerald Penn
Computational Environments for Grammar Development and Linguistic Engineering

1994

pdf
Default Finite State Machines and Finite State Phonology
Gerald Penn | Richmond Thomason
Computational Phonology