Tommi A. Pirinen

Also published as: Tommi A Pirinen, Tommi Pirinen


2022

pdf bib
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
Atul Kr. Ojha | Chao-Hong Liu | Ekaterina Vylomova | Jade Abbott | Jonathan Washington | Nathaniel Oco | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)

2021

pdf
Vowel Harmony Viewed as Error-Correcting Code
Yvo Meeres | Tommi A Pirinen
Proceedings of the Society for Computation in Linguistics 2021

2020

pdf bib
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Alina Karakanta | Atul Kr. Ojha | Chao-Hong Liu | Jade Abbott | John Ortega | Jonathan Washington | Nathaniel Oco | Surafel Melaku Lakew | Tommi A Pirinen | Valentin Malykh | Varvara Logacheva | Xiaobing Zhao
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

pdf
An Unsupervised Method for Weighting Finite-state Morphological Analyzers
Amr Keleg | Francis Tyers | Nick Howell | Tommi Pirinen
Proceedings of the Twelfth Language Resources and Evaluation Conference

Morphological analysis is one of the tasks that have been studied for years. Different techniques have been used to develop models for performing morphological analysis. Models based on finite state transducers have proved to be more suitable for languages with low available resources. In this paper, we have developed a method for weighting a morphological analyzer built using finite state transducers in order to disambiguate its results. The method is based on a word2vec model that is trained in a completely unsupervised way using raw untagged corpora and is able to capture the semantic meaning of the words. Most of the methods used for disambiguating the results of a morphological analyzer relied on having tagged corpora that need to manually built. Additionally, the method developed uses information about the token irrespective of its context unlike most of the other techniques that heavily rely on the word’s context to disambiguate its set of candidate analyses.

pdf bib
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages
Tommi A Pirinen | Francis M. Tyers | Michael Rießler
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages

2019

pdf bib
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages
Tommi A. Pirinen | Heiki-Jaan Kaalep | Francis M. Tyers
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

pdf
Neural and rule-based Finnish NLP models—expectations, experiments and experiences
Tommi A Pirinen
Proceedings of the Fifth International Workshop on Computational Linguistics for Uralic Languages

pdf
Apertium-fin-eng–Rule-based Shallow Machine Translation for WMT 2019 Shared Task
Tommi Pirinen
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this paper we describe a rule-based, bi-directional machine translation system for the Finnish—English language pair. The baseline system was based on the existing data of FinnWordNet, omorfi and apertium-eng. We have built the disambiguation, lexical selection and translation rules by hand. The dictionaries and rules have been developed based on the shared task data. We describe in this article the use of the shared task data as a kind of a test-driven development workflow in RBMT development and show that it suits perfectly to a modern software engineering continuous integration workflow of RBMT and yields big increases to BLEU scores with minimal effort.

pdf
Workflows for kickstarting RBMT in virtually No-Resource Situation
Tommi A Pirinen
Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages

pdf
Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking
Tommi A Pirinen
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

2018

pdf bib
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages
Tommi A. Pirinen | Michael Rießler | Jack Rueter | Trond Trosterud | Francis M. Tyers
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

2017

pdf
North-Sámi to Finnish rule-based machine translation system
Tommi Pirinen | Francis M. Tyers | Trond Trosterud | Ryan Johnson | Kevin Unhammer | Tiina Puolakainen
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages
Francis M. Tyers | Michael Rießler | Tommi A. Pirinen | Trond Trosterud
Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages

2015

pdf
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf
Omorfi — Free and open source morphological lexical database for Finnish
Tommi A Pirinen
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf
Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling
Raphael Rubino | Tommi Pirinen | Miquel Esplà-Gomis | Nikola Ljubešić | Sergio Ortiz-Rojas | Vassilis Papavassiliou | Prokopis Prokopidis | Antonio Toral
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf
Abu-MaTran: Automatic building of Machine Translation
Antonio Toral | Tommi A. Pirinen | Andy Way | Gema Ramírez-Sánchez | Sergio Ortiz Rojas | Raphael Rubino | Miquel Esplà | Mikel L. Forcada | Vassilis Papavassiliou | Prokopis Prokopidis | Nikola Ljubešić
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf
Heuristic Hyper-minimization of Finite State Lexicons
Senka Drobac | Krister Lindén | Tommi Pirinen | Miikka Silfverberg
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Flag diacritics, which are special multi-character symbols executed at runtime, enable optimising finite-state networks by combining identical sub-graphs of its transition graph. Traditionally, the feature has required linguists to devise the optimisations to the graph by hand alongside the morphological description. In this paper, we present a novel method for discovering flag positions in morphological lexicons automatically, based on the morpheme structure implicit in the language description. With this approach, we have gained significant decrease in the size of finite-state networks while maintaining reasonable application speed. The algorithm can be applied to any language description, where the biggest achievements are expected in large and complex morphologies. The most noticeable reduction in size we got with a morphological transducer for Greenlandic, whose original size is on average about 15 times larger than other morphologies. With the presented hyper-minimization method, the transducer is reduced to 10,1% of the original size, with lookup speed decreased only by 9,5%.

pdf
Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain
Antonio Toral | Raphael Rubino | Miquel Esplà-Gomis | Tommi Pirinen | Andy Way | Gema Ramírez-Sánchez
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

2013

pdf
Building an Open-Source Development Infrastructure for Language Technology Projects
Sjur N. Moshagen | Tommi Pirinen | Trond Trosterud
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
Effect of Language and Error Models on Efficiency of Finite-State Spell-Checking and Correction
Tommi A Pirinen | Sam Hardwick
Proceedings of the 10th International Workshop on Finite State Methods and Natural Language Processing

2011

pdf
Modularisation of Finnish Finite-State Language Description – Towards Wide Collaboration in Open Source Development of a Morphological Analyser
Tommi Pirinen
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2009

pdf
Weighted Finite-State Morphological Analysis of Finnish Compounding with HFST-LEXC
Krister Lindén | Tommi Pirinen
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)