Šarūnas Girdzijauskas

Also published as: Sarunas Girdzijauskas


2022

pdf
Detecting Security Patches in Java Projects Using NLP Technology
Andrea Stefanoni | Šarūnas Girdzijauskas | Christina Jenkins | Zekarias T. Kefato | Licia Sbattella | Vincenzo Scotti | Emil Wåreus
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

pdf
Challenging the Assumption of Structure-based embeddings in Few- and Zero-shot Knowledge Graph Completion
Filip Cornell | Chenda Zhang | Jussi Karlgren | Sarunas Girdzijauskas
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we report experiments on Few- and Zero-shot Knowledge Graph completion, where the objective is to add missing relational links between entities into an existing Knowledge Graph with few or no previous examples of the relation in question. While previous work has used pre-trained embeddings based on the structure of the graph as input for a neural network, nobody has, to the best of our knowledge, addressed the task by only using textual descriptive data associated with the entities and relations, much since current standard benchmark data sets lack such information. We therefore enrich the benchmark data sets for these tasks by collecting textual description data to provide a new resource for future research to bridge the gap between structural and textual Knowledge Graph completion. Our results show that we can improve the results for Knowledge Graph completion for both Few- and Zero-shot scenarios with up to a two-fold increase of all metrics in the Zero-shot setting. From a more general perspective, our experiments demonstrate the value of using textual resources to enrich more formal representations of human knowledge and in the utility of transfer learning from textual data and text collections to enrich and maintain knowledge resources.

2021

pdf
Decentralized Word2Vec Using Gossip Learning
Abdul Aziz Alkathiri | Lodovico Giaretta | Sarunas Girdzijauskas | Magnus Sahlgren
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a loss of quality as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training.