Ciprian-Octavian Truică

Also published as: Ciprian-Octavian Truica


2023

pdf
Towards a Conversational Web? A Benchmark for Analysing Semantic Change with Conversational Knowledge Bots and Linked Open Data
Florentina Armaselu | Elena-Simona Apostol | Christian Chiarcos | Anas Fahad Khan | Chaya Liebeskind | Barbara McGillivray | Ciprian-Octavian Truica | Andrius Utka | Giedrė Valūnaitė-Oleškevičienė
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf
Workflow Reversal and Data Wrangling in Multilingual Diachronic Analysis and Linguistic Linked Open Data Modelling
Florentina Armaselu | Barbara McGillivray | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Andrius Utka | Daniela Gifu | Anas Fahad Khan | Elena-Simona Apostol | Ciprian-Octavian Truica
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf
Validation of Language Agnostic Models for Discourse Marker Detection
Mariana Damova | Kostadin Mishev | Giedrė Valūnaitė-Oleškevičienė | Chaya Liebeskind | Purificação Silvano | Dimitar Trajanov | Ciprian-Octavian Truica | Elena-Simona Apostol | Christian Chiarcos | Anna Baczkowska
Proceedings of the 4th Conference on Language, Data and Knowledge

2022

pdf
Modelling Collocations in OntoLex-FrAC
Christian Chiarcos | Katerina Gkirtzou | Maxim Ionov | Besim Kabashi | Fahad Khan | Ciprian-Octavian Truică
Proceedings of Globalex Workshop on Linked Lexicography within the 13th Language Resources and Evaluation Conference

Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications.

pdf
Cross-Lingual Link Discovery for Under-Resourced Languages
Michael Rosner | Sina Ahmadi | Elena-Simona Apostol | Julia Bosque-Gil | Christian Chiarcos | Milan Dojchinovski | Katerina Gkirtzou | Jorge Gracia | Dagmar Gromann | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Gilles Sérasset | Ciprian-Octavian Truică
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We rst introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We de ne under-resourced languages with a speci c focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.

pdf
ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers
Purificação Silvano | Mariana Damova | Giedrė Valūnaitė Oleškevičienė | Chaya Liebeskind | Christian Chiarcos | Dimitar Trajanov | Ciprian-Octavian Truică | Elena-Simona Apostol | Anna Baczkowska
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Discourse markers carry information about the discourse structure and organization, and also signal local dependencies or epistemological stance of speaker. They provide instructions on how to interpret the discourse, and their study is paramount to understand the mechanism underlying discourse organization. This paper presents a new language resource, an ISO-based annotated multilingual parallel corpus for discourse markers. The corpus comprises nine languages, Bulgarian, Lithuanian, German, European Portuguese, Hebrew, Romanian, Polish, and Macedonian, with English as a pivot language. In order to represent the meaning of the discourse markers, we propose an annotation scheme of discourse relations from ISO 24617-8 with a plug-in to ISO 24617-2 for communicative functions. We describe an experiment in which we applied the annotation scheme to assess its validity. The results reveal that, although some extensions are required to cover all the multilingual data, it provides a proper representation of discourse markers value. Additionally, we report some relevant contrastive phenomena concerning discourse markers interpretation and role in discourse. This first step will allow us to develop deep learning methods to identify and extract discourse relations and communicative functions, and to represent that information as Linguistic Linked Open Data (LLOD).

pdf
Modelling Frequency, Attestation, and Corpus-Based Information with OntoLex-FrAC
Christian Chiarcos | Elena-Simona Apostol | Besim Kabashi | Ciprian-Octavian Truică
Proceedings of the 29th International Conference on Computational Linguistics

OntoLex-Lemon has become a de facto standard for lexical resources in the web of data. This paper provides the first overall description of the emerging OntoLex module for Frequency, Attestations, and Corpus-Based Information (OntoLex-FrAC) that is intended to complement OntoLex-Lemon with the necessary vocabulary to represent major types of information found in or automatically derived from corpora, for applications in both language technology and the language sciences.

2020

pdf
Neural Approaches for Natural Language Interfaces to Databases: A Survey
Radu Cristian Alexandru Iacob | Florin Brad | Elena-Simona Apostol | Ciprian-Octavian Truică | Ionel Alexandru Hosu | Traian Rebedea
Proceedings of the 28th International Conference on Computational Linguistics

A natural language interface to databases (NLIDB) enables users without technical expertise to easily access information from relational databases. Interest in NLIDBs has resurged in the past years due to the availability of large datasets and improvements to neural sequence-to-sequence models. In this survey we focus on the key design decisions behind current state of the art neural approaches, which we group into encoder and decoder improvements. We highlight the three most important directions, namely linking question tokens to database schema elements (schema linking), better architectures for encoding the textual query taking into account the schema (schema encoding), and improved generation of structured queries using autoregressive neural models (grammar-based decoders). To foster future research, we also present an overview of the most important NLIDB datasets, together with a comparison of the top performing neural models and a short insight into recent non deep learning solutions.