pdf
bib
Proceedings of the 18th International Conference on Parsing Technologies (IWPT, SyntaxFest 2025)
Kenji Sagae
|
Stephan Oepen
pdf
bib
abs
An Efficient Parser for Bounded-Order Product-Free Lambek Categorial Grammar via Term Graph
Jinman Zhao
|
Gerald Penn
Lambek Categorial Grammar (LCG) parsing has been proved to be an NP-complete problem. However, in the bounded-order case, the complexity can be reduced to polynomial time. (CITATION) first introduced the term graph, a simple graphical representation for LCG parsing, but his algorithm for using it remained largely inscrutable. (CITATION) later proposed a polynomial algorithm for bounded-order LCG parsing based on cyclic linear logic, yet both approaches remain largely theoretical, with no open-source implementations available. In this work, we combine the term-graph representation with insights from cyclic linear logic to develop a novel parsing algorithm for bounded-order LCG. Furthermore, we release our parser as an open-source tool.
pdf
bib
abs
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Hiroshi Matsuda
|
Chunpeng Ma
|
Masayuki Asahara
Recent advances in large language models (LLMs) have enabled impressive performance in various tasks. However, standard prompting often struggles to produce structurally valid and accurate outputs, especially in dependency parsing. We propose a novel step-by-step instruction strategy, where universal part-of-speech tagging precedes the prediction of syntactic heads and dependency labels, and a simplified CoNLL-U like output format, our method achieves state-of-the-art accuracy on Universal Dependencies datasets across 17 languages without hallucination or contamination. We further show that multilingual fine-tuning simultaneously improves cross-language generalization performance. Our results highlight the effectiveness of explicit reasoning steps in LLM-based parsing and offer a scalable, format-consistent alternative to bracket-based approaches.
pdf
bib
abs
CCG Revisited: A Multilingual Empirical Study of the Kuhlmann-Satta Algorithm
Paul He
|
Gerald Penn
We revisit the polynomial-time CCG parsing algorithm introduced by Kuhlmann & Satta (2014), and provide a publicly available implementation of it. We evaluate its empirical performance against a naive CKY-style parser across the Parallel Meaning Bank (PMB) corpus. While the fast parser is slightly slower on average, relative to the size of the PMB, but the trend improves as a function of sentence length, and the PMB is large enough to witness an inversion. Our analysis quantifies this crossover and highlights the importance of derivational context decomposition in practical parsing scenarios.
pdf
bib
abs
High-Accuracy Transition-Based Constituency Parsing
John Bauer
|
Christopher D. Manning
Constituency parsers have improved markedly in recent years, with the F1 accuracy on the venerable Penn Treebank reaching 96.47, half of the error rate of the first transformer model in 2017. However, while dependency parsing frequently uses transition-based parsers, it is unclear whether transition-based parsing can still provide state-of-the-art results for constituency parsing. Despite promising work by Liu and Zhang in 2017 using an in-order transition-based parser, recent work uses other methods, mainly CKY charts built over LLM encoders. Starting from previous work, we implement self-training and a dynamic oracle to make a language-agnostic transition-based constituency parser. We test on seven languages; using Electra embeddings as the input layer on Penn Treebank, with a self-training dataset built from Wikipedia, our parser achieves a new SOTA F1 of 96.61.
pdf
bib
abs
Crosslingual Dependency Parsing of Hawaiian and Cook Islands Māori using Universal Dependencies
Gabriel H. Gilbert
|
Rolando Coto-Solanu
|
Sally Akevai Nicholas
|
Lauren Houchens
|
Sabrina Barton
|
Trinity Pryor
This paper presents the first Universal Dependency (UD) treebank for ʻŌlelo Hawaiʻi (Hawaiian). We discuss some of the difficulties in describing Hawaiian grammar using UD, and train models for automatic parsing. We also combined this data with UD parses from another Eastern Polynesian language, Cook Islands Māori, to train a crosslingual Polynesian parser using UDPipe2. The crosslingual parser produced a statistically significant improvement of 2.4% in the labeled attachment score (LAS) when parsing Hawaiian, and this improvement didn’t produce a negative impact in the LAS of Cook Islands Māori. We will use this parser to accelerate the linguistic documentation of Hawaiian.