2023
pdf
abs
Corpus-based Syntactic Typological Methods for Dependency Parsing Improvement
Diego Alves
|
Božo Bekavac
|
Daniel Zeman
|
Marko Tadić
Proceedings of the 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
This article presents a comparative analysis of four different syntactic typological approaches applied to 20 different languages to determine the most effective one to be used for the improvement of dependency parsing results via corpora combination. We evaluated these strategies by calculating the correlation between the language distances and the empirical LAS results obtained when languages were combined in pairs. From the results, it was possible to observe that the best method is based on the extraction of word order patterns which happen inside subtrees of the syntactic structure of the sentences.
pdf
abs
Analysis of Corpus-based Word-Order Typological Methods
Diego Alves
|
Božo Bekavac
|
Daniel Zeman
|
Marko Tadić
Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023)
This article presents a comparative analysis of four different syntactic typological approaches applied to 20 different languages. We compared three specific quantitative methods, using parallel CoNLL-U corpora, to the classification obtained via syntactic features provided by a typological database (lang2vec). First, we analyzed the Marsagram linear approach which consists of extracting the frequency word-order patterns regarding the position of components inside syntactic nodes. The second approach considers the relative position of heads and dependents, and the third is based simply on the relative position of verbs and objects. From the results, it was possible to observe that each method provides different language clusters which can be compared to the classic genealogical classification (the lang2vec and the head and dependent methods being the closest). As different word-order phenomena are considered in these specific typological strategies, each one provides a different angle of analysis to be applied according to the precise needs of the researchers.
2022
pdf
abs
Multilingual Comparative Analysis of Deep-Learning Dependency Parsing Results Using Parallel Corpora
Diego Alves
|
Marko Tadić
|
Božo Bekavac
Proceedings of the BUCC Workshop within LREC 2022
This article presents a comparative analysis of dependency parsing results for a set of 16 languages, coming from a large variety of linguistic families and genera, whose parallel corpora were used to train a deep-learning tool. Results are analyzed in comparison to an innovative way of classifying languages concerning the head directionality parameter used to perform a quantitative syntactic typological classification of languages. It has been shown that, despite using parallel corpora, there is a large discrepancy in terms of LAS results. The obtained results show that this heterogeneity is mainly due to differences in the syntactic structure of the selected languages, where Indo-European ones, especially Romance languages, have the best scores. It has been observed that the differences in the size of the representation of each language in the language model used by the deep-learning tool also play a major role in the dependency parsing efficacy. Other factors, such as the number of dependency parsing labels may also have an influence on results with more complex labeling systems such as the Polish language.
2017
pdf
Language Generation from DB Query
Kristina Kocijan
|
Božo Bekavac
|
Krešimir Šojat
Proceedings of the Linguistic Resources for Automatic Natural Language Generation - LiRA@NLG
2014
pdf
XLike Project Language Analysis Services
Xavier Carreras
|
Lluís Padró
|
Lei Zhang
|
Achim Rettinger
|
Zhixing Li
|
Esteban García-Cuesta
|
Željko Agić
|
Božo Bekavac
|
Blaz Fortuna
|
Tadej Štajner
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
2007
pdf
bib
Implementation of Croatian NERC System
Božo Bekavac
|
Marko Tadić
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing
2004
pdf
Making Monolingual Corpora Comparable: a Case Study of Bulgarian and Croatian
Božo Bekavac
|
Petya Osenova
|
Kiril Simov
|
Marko Tadić
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)