2024
pdf
abs
Investigating Multilinguality in the Plenary Sessions of the Parliament of Finland with Automatic Language Identification
Tommi Jauhiainen
|
Jussi Piitulainen
|
Erik Axelson
|
Ute Dieckmann
|
Mietta Lennes
|
Jyrki Niemi
|
Jack Rueter
|
Krister Lindén
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
In this paper, we use automatic language identification to investigate the usage of different languages in the plenary sessions of the Parliament of Finland. Finland has two national languages, Finnish and Swedish. The plenary sessions are published as transcriptions of speeches in Parliament, reflecting the language the speaker used. In addition to charting out language use, we demonstrate how language identification can be used to audit the quality of the dataset. On the one hand, we made slight improvements to our language identifier; on the other hand, we made a list of improvement suggestions for the next version of the dataset.
2014
pdf
abs
HFST-SweNER — A New NER Resource for Swedish
Dimitrios Kokkinakis
|
Jyrki Niemi
|
Sam Hardwick
|
Krister Lindén
|
Lars Borin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Named entity recognition (NER) is a knowledge-intensive information extraction task that is used for recognizing textual mentions of entities that belong to a predefined set of categories, such as locations, organizations and time expressions. NER is a challenging, difficult, yet essential preprocessing technology for many natural language processing applications, and particularly crucial for language understanding. NER has been actively explored in academia and in industry especially during the last years due to the advent of social media data. This paper describes the conversion, modeling and adaptation of a Swedish NER system from a hybrid environment, with integrated functionality from various processing components, to the Helsinki Finite-State Transducer Technology (HFST) platform. This new HFST-based NER (HFST-SweNER) is a full-fledged open source implementation that supports a variety of generic named entity types and consists of multiple, reusable resource layers, e.g., various n-gram-based named entity lists (gazetteers).
2013
pdf
Nordic and Baltic Wordnets Aligned and Compared through “WordTies”
Bolette Sandford Pedersen
|
Lars Borin
|
Markus Forsberg
|
Neeme Kahusk
|
Krister Lindén
|
Jyrki Niemi
|
Niklas Nisbeth
|
Lars Nygaard
|
Heili Orav
|
Eirikur Rögnvaldsson
|
Mitchell Seaton
|
Kadri Vider
|
Kaarlo Voionmaa
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)
2012
pdf
abs
Representing the Translation Relation in a Bilingual Wordnet
Jyrki Niemi
|
Krister Lindén
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN), and constructing the FiWN database. FiWN was created by translating all the word senses of the Princeton WordNet (PWN) into Finnish and by joining the translations with the semantic and lexical relations of PWN extracted into a relational (database) format. The approach naturally resulted in a translation relation between PWN and FiWN. Unlike many other multilingual wordnets, the translation relation in FiWN is not primarily on the synset level, but on the level of an individual word sense, which allows more precise translation correspondences. This can easily be projected into a synset-level translation relation, used for linking with other wordnets, for example, via Core WordNet. Synset-level translations are also used as a default in the absence of word-sense translations. The FiWN data in the relational database can be converted to other formats. In the PWN database format, translations are attached to source-language words, allowing the implementation of a Web search interface also working as a bilingual dictionary. Another representation encodes the translation relation as a finite-state transducer.
2008
pdf
Quantification and Implication in Semantic Calendar Expressions Represented with Finite-State Transducers
Jyrki Niemi
|
Kimmo Koskenniemi
Coling 2008: Companion volume: Posters
2007
pdf
Representing Calendar Expressions with Finite-State Transducers that Bracket Periods of Time on a Hierachical Timeline
Jyrki Niemi
|
Kimmo Koskenniemi
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)
2006
pdf
Towards modeling the semantics of calendar expressions as extended regular expressions
Jyrki Niemi
|
Lauri Carlson
Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005)