Anssi Yli-Jyrä
2025
Hierarchical Bracketing Encodings for Dependency Parsing as Tagging
Ana Ezquerro | David Vilares | Anssi Yli-Jyrä | Carlos Gómez-Rodríguez
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ana Ezquerro | David Vilares | Anssi Yli-Jyrä | Carlos Gómez-Rodríguez
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present a family of encodings for sequence labeling dependency parsing, based on the concept of hierarchical bracketing. We show that the existing 4-bit projective encoding belongs to this family, but it is suboptimal in the number of labels used to encode a tree. We derive an optimal hierarchical bracketing, which minimizes the number of symbols used and encodes projective trees using only 12 distinct labels (vs. 16 for the 4-bit encoding). We also extend optimal hierarchical bracketing to support arbitrary non-projectivity in a more compact way than previous encodings. Our new encodings yield competitive accuracy on a diverse set of treebanks.
2020
HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment
Anssi Yli-Jyrä | Josi Purhonen | Matti Liljeqvist | Arto Antturi | Pekka Nieminen | Kari M. Räntilä | Valtter Luoto
Proceedings of the Twelfth Language Resources and Evaluation Conference
Anssi Yli-Jyrä | Josi Purhonen | Matti Liljeqvist | Arto Antturi | Pekka Nieminen | Kari M. Räntilä | Valtter Luoto
Proceedings of the Twelfth Language Resources and Evaluation Conference
Twenty-five years ago, morphologically aligned Hebrew-Finnish and Greek-Finnish bitexts (texts accompanied by a translation) were constructed manually in order to create an analytical concordance (Luoto et al., eds. 1997) for a Finnish Bible translation. The creators of the bitexts recently secured the publisher’s permission to release its fine-grained alignment, but the alignment was still dependent on proprietary, third-party resources such as a copyrighted text edition and proprietary morphological analyses of the source texts. In this paper, we describe a nontrivial editorial process starting from the creation of the original one-purpose database and ending with its reconstruction using only freely available text editions and annotations. This process produced an openly available dataset that contains (i) the source texts and their translations, (ii) the morphological analyses, (iii) the cross-lingual morpheme alignments.
2019
Transition-Based Coding and Formal Language Theory for Ordered Digraphs
Anssi Yli-Jyrä
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing
Anssi Yli-Jyrä
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing
Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.
2017
Generic Axiomatization of Families of Noncrossing Graphs in Dependency Parsing
Anssi Yli-Jyrä | Carlos Gómez-Rodríguez
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Anssi Yli-Jyrä | Carlos Gómez-Rodríguez
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present a simple encoding for unlabeled noncrossing graphs and show how its latent counterpart helps us to represent several families of directed and undirected graphs used in syntactic and semantic parsing of natural language as context-free languages. The families are separated purely on the basis of forbidden patterns in latent encoding, eliminating the need to differentiate the families of non-crossing graphs in inference algorithms: one algorithm works for all when the search space can be controlled in parser input.
Bounded-Depth High-Coverage Search Space for Noncrossing Parses
Anssi Yli-Jyrä
Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP 2017)
Anssi Yli-Jyrä
Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP 2017)
2015
Three Equivalent Codes for Autosegmental Representations
Anssi Yli-Jyrä
Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 (FSMNLP 2015 Düsseldorf)
Anssi Yli-Jyrä
Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing 2015 (FSMNLP 2015 Düsseldorf)
2013
The mathematics of language learning
András Kornai | Gerald Penn | James Rogers | Anssi Yli-Jyrä
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)
András Kornai | Gerald Penn | James Rogers | Anssi Yli-Jyrä
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)
On Finite-State Tonology with Autosegmental Representations
Anssi Yli-Jyrä
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing
Anssi Yli-Jyrä
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing
2012
Implementation of Replace Rules Using Preference Operator
Senka Drobac | Miikka Silfverberg | Anssi Yli-Jyrä
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing
Senka Drobac | Miikka Silfverberg | Anssi Yli-Jyrä
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing
Refining the Design of a Contracting Finite-State Dependency Parser
Anssi Yli-Jyrä | Jussi Piitulainen | Atro Voutilainen
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing
Anssi Yli-Jyrä | Jussi Piitulainen | Atro Voutilainen
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing
2011
Compiling Simple Context Restrictions with Nondeterministic Automata
Anssi Yli-Jyrä
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Anssi Yli-Jyrä
Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing
Explorations on Positionwise Flag Diacritics in Finite-State Morphology
Anssi Yli-Jyrä
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)
Anssi Yli-Jyrä
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)
2009
An Efficient Double Complementation Algorithm for Superposition-Based Finite-State Morphology
Anssi Yli-Jyrä
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)
Anssi Yli-Jyrä
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)
2004
Axiomatization of Restricted Non-Projective Dependency Trees through Finite-State Constraints that Analyse Crossing Bracketings
Anssi Yli-Jyrä
Proceedings of the Workshop on Recent Advances in Dependency Grammar
Anssi Yli-Jyrä
Proceedings of the Workshop on Recent Advances in Dependency Grammar