Anssi Yli-Jyrä

2025

Hierarchical Bracketing Encodings for Dependency Parsing as Tagging
Ana Ezquerro | David Vilares | Anssi Yli-Jyrä | Carlos Gómez-Rodríguez
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a family of encodings for sequence labeling dependency parsing, based on the concept of hierarchical bracketing. We show that the existing 4-bit projective encoding belongs to this family, but it is suboptimal in the number of labels used to encode a tree. We derive an optimal hierarchical bracketing, which minimizes the number of symbols used and encodes projective trees using only 12 distinct labels (vs. 16 for the 4-bit encoding). We also extend optimal hierarchical bracketing to support arbitrary non-projectivity in a more compact way than previous encodings. Our new encodings yield competitive accuracy on a diverse set of treebanks.

2020

pdf bib abs

Twenty-five years ago, morphologically aligned Hebrew-Finnish and Greek-Finnish bitexts (texts accompanied by a translation) were constructed manually in order to create an analytical concordance (Luoto et al., eds. 1997) for a Finnish Bible translation. The creators of the bitexts recently secured the publisher’s permission to release its fine-grained alignment, but the alignment was still dependent on proprietary, third-party resources such as a copyrighted text edition and proprietary morphological analyses of the source texts. In this paper, we describe a nontrivial editorial process starting from the creation of the original one-purpose database and ending with its reconstruction using only freely available text editions and annotations. This process produced an openly available dataset that contains (i) the source texts and their translations, (ii) the morphological analyses, (iii) the cross-lingual morpheme alignments.

2019

pdf bib abs

Transition-Based Coding and Formal Language Theory for Ordered Digraphs
Anssi Yli-Jyrä
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.

2017

pdf bib abs

Generic Axiomatization of Families of Noncrossing Graphs in Dependency Parsing
Anssi Yli-Jyrä | Carlos Gómez-Rodríguez
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a simple encoding for unlabeled noncrossing graphs and show how its latent counterpart helps us to represent several families of directed and undirected graphs used in syntactic and semantic parsing of natural language as context-free languages. The families are separated purely on the basis of forbidden patterns in latent encoding, eliminating the need to differentiate the families of non-crossing graphs in inference algorithms: one algorithm works for all when the search space can be controlled in parser input.

pdf bib

Bounded-Depth High-Coverage Search Space for Noncrossing Parses
Anssi Yli-Jyrä
Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP 2017)

Axiomatization of Restricted Non-Projective Dependency Trees through Finite-State Constraints that Analyse Crossing Bracketings
Anssi Yli-Jyrä
Proceedings of the Workshop on Recent Advances in Dependency Grammar