2023
pdf
abs
Large Language Models for Multilingual Slavic Named Entity Linking
Rinalds Vīksna
|
Inguna Skadiņa
|
Daiga Deksne
|
Roberts Rozis
Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)
This paper describes our submission for the 4th Shared Task on SlavNER on three Slavic languages - Czech, Polish and Russian. We use pre-trained multilingual XLM-R Language Model (Conneau et al., 2020) and fine-tune it for three Slavic languages using datasets provided by organizers. Our multilingual NER model achieves 0.896 F-score on all corpora, with the best result for Czech (0.914) and the worst for Russian (0.880). Our cross-language entity linking module achieves F-score of 0.669 in the official SlavNER 2023 evaluation.
2017
pdf
Tilde’s Machine Translation Systems for WMT 2017
Mārcis Pinnis
|
Rihards Krišlauks
|
Toms Miks
|
Daiga Deksne
|
Valters Šics
Proceedings of the Second Conference on Machine Translation
2014
pdf
abs
Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus
Raivis Skadiņš
|
Jörg Tiedemann
|
Roberts Rozis
|
Daiga Deksne
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The European Union is a great source of high quality documents with translations into several languages. Parallel corpora from its publications are frequently used in various tasks, machine translation in particular. A source that has not systematically been explored yet is the EU Bookshop ― an online service and archive of publications from various European institutions. The service contains a large body of publications in the 24 official of the EU. This paper describes our efforts in collecting those publications and converting them to a format that is useful for natural language processing in particular statistical machine translation. We report our procedure of crawling the website and various pre-processing steps that were necessary to clean up the data after the conversion from the original PDF files. Furthermore, we demonstrate the use of this dataset in training SMT models for English, French, German, Spanish, and Latvian.
2013
pdf
Finite State Morphology Tool for Latvian
Daiga Deksne
Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing
2011
pdf
CFG based grammar checker for Latvian
Daiga Deksne
|
Raivis Skadiņš
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)
2008
pdf
abs
Dictionary of Multiword Expressions for Translation into highly Inflected Languages
Daiga Deksne
|
Raivis Skadiņš
|
Inguna Skadiņa
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Treatment of Multiword Expressions (MWEs) is one of the most complicated issues in natural language processing, especially in Machine Translation (MT). The paper presents dictionary of MWEs for a English-Latvian MT system, demonstrating a way how MWEs could be handled for inflected languages with rich morphology and rather free word order. The proposed dictionary of MWEs consists of two constituents: a lexicon of phrases and a set of MWE rules. The lexicon of phrases is rather similar to translation lexicon of the MT system, while MWE rules describe syntactic structure of the source and target sentence allowing correct transformation of different MWE types into the target language and ensuring correct syntactic structure. The paper demonstrates this approach on different MWE types, starting from simple syntactic structures, followed by more complicated cases and including fully idiomatic expressions. Automatic evaluation shows that the described approach increases the quality of translation by 0.6 BLEU points.
2007
pdf
Comprehension Assistant for Languages of Baltic States
Inguna Skadiņa
|
Andrejs Vasiļjevs
|
Daiga Deksne
|
Raivis Skadiņš
|
Linda Goldberga
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)