Raivis Skadiņš

Also published as: Raivis Skadins, Raivis Skadinš

2022

pdf abs
Assessing Multilinguality of Publicly Accessible Websites
Rinalds Vīksna | Inguna Skadiņa | Raivis Skadiņš | Andrejs Vasiļjevs | Roberts Rozis
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Although information on the Internet can be shared in many languages, the language presence on the World Wide Web is very disproportionate. The problem of multilingualism on the Web, in particular access, availability and quality of information in the world’s languages, has been the subject of UNESCO focus for several decades. Making European websites more multilingual is also one of the focal targets of the Connecting Europe Facility Automated Translation (CEF AT) digital service infrastructure. In order to monitor this goal, alongside other possible solutions, CEF AT needs a methodology and easy to use tool to assess the degree of multilingualism of a given website. In this paper we investigate methods and tools that automatically analyse the language diversity of the Web and propose indicators and methodology on how to measure the multilingualism of European websites. We also introduce a prototype tool based on open-source software that helps to assess multilingualism of the Web and can be independently run at set intervals. We also present initial results obtained with our tool that allows us to conclude that multilingualism on the Web is still a problem not only at the world level, but also at the European and regional level.

2020

pdf abs
The COMPRISE Cloud Platform
Raivis Skadiņš | Askars Salimbajevs
Proceedings of the 1st International Workshop on Language Technology Platforms

This paper presents the COMPRISE cloud platform that is developed in the H2020 project. We present an overview of the COMPRISE project, its main goals, components, and how the cloud platform fits in the context of the overall project. The COMPRISE cloud platform is presented in more detail – main users, use scenarios, functions, implementation details, and how it will be used by both COMPRISE’s targeted audience and the broader language-technology community.

This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed. We summarize the volume of obtained corpora and provide quality metrics of MT systems trained on this data. Resulting MT systems for English-Latvian, Latvian English and Latvian Russian are integrated in the Latvian e-service portal and are freely available on website HUGO.LV. This paper can serve as a guidance for similar activities initiated in other countries, particularly in the context of European Language Resource Coordination action.

2015

pdf
Word Alignment Based Parallel Corpora Evaluation and Cleaning Using Machine Learning Techniques
Ieva Zariņa | Pēteris Ņikiforovs | Raivis Skadiņš
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf abs
Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus
Raivis Skadiņš | Jörg Tiedemann | Roberts Rozis | Daiga Deksne
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The European Union is a great source of high quality documents with translations into several languages. Parallel corpora from its publications are frequently used in various tasks, machine translation in particular. A source that has not systematically been explored yet is the EU Bookshop ― an online service and archive of publications from various European institutions. The service contains a large body of publications in the 24 official of the EU. This paper describes our efforts in collecting those publications and converting them to a format that is useful for natural language processing in particular statistical machine translation. We report our procedure of crawling the website and various pre-processing steps that were necessary to clean up the data after the conversion from the original PDF files. Furthermore, we demonstrate the use of this dataset in training SMT models for English, French, German, Spanish, and Latvian.

pdf
Application of machine translation in localization into low-resourced languages
Raivis Skadiņš | Mārcis Pinnis | Andrejs Vasiļjevs | Inguna Skadiņa | Tomas Hudik
Proceedings of the 17th Annual Conference of the European Association for Machine Translation

Real-world challenges in application of MT for localization: the Baltic case
Mārcis Pinnis | Raivis Skadiņš | Andrejs Vasiļjevs
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Users Track

Machine translation for e-government – the Baltic case
Andrejs Vasiļjevs | Rihards Kalniņš | Mārcis Pinnis | Raivis Skadiņš
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Users Track

2013

pdf
Application of Online Terminology Services in Statistical Machine Translation
Raivis Skadins | Marcis Pinnis | Tatiana Gornostay | Andrejs Vasiljevs
Proceedings of Machine Translation Summit XIV: Posters

2012

pdf
LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
Andrejs Vasiļjevs | Raivis Skadiņš | Jörg Tiedemann
Proceedings of the ACL 2012 System Demonstrations

2011

pdf
Evaluation of SMT in localization to under-resourced inflected language
Raivis Skadiņš | Maris Puriņš | Inguna Skadiņa | Andrejs Vasiļjevs
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

pdf
Toponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge
Raivis Skadiņš | Tatiana Gornostay | Valters Šics
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf
CFG based grammar checker for Latvian
Daiga Deksne | Raivis Skadiņš
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf
LetsMT!: Cloud-Based Platform for Building User Tailored Machine Translation Engines
Andrejs Vasiljevs | Raivis Skadinš | Jörg Tiedemann
Proceedings of Machine Translation Summit XIII: System Presentations

2008

pdf abs
Dictionary of Multiword Expressions for Translation into highly Inflected Languages
Daiga Deksne | Raivis Skadiņš | Inguna Skadiņa
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Treatment of Multiword Expressions (MWEs) is one of the most complicated issues in natural language processing, especially in Machine Translation (MT). The paper presents dictionary of MWEs for a English-Latvian MT system, demonstrating a way how MWEs could be handled for inflected languages with rich morphology and rather free word order. The proposed dictionary of MWEs consists of two constituents: a lexicon of phrases and a set of MWE rules. The lexicon of phrases is rather similar to translation lexicon of the MT system, while MWE rules describe syntactic structure of the source and target sentence allowing correct transformation of different MWE types into the target language and ensuring correct syntactic structure. The paper demonstrates this approach on different MWE types, starting from simple syntactic structures, followed by more complicated cases and including fully idiomatic expressions. Automatic evaluation shows that the described approach increases the quality of translation by 0.6 BLEU points.