Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017
In this paper we present a semantic enrichment approach for linking two distinct data sets: the ÖBL (Austrian Biographical Dictionary) and the DBÖ (Database of Bavarian Dialects in Austria). Although the data sets are different in their content and in the structuring of data, they contain similar common “entities” such as names of persons. Here we describe the semantic enrichment process of how these data sets can be inter-linked through URIs (Uniform Resource Identifiers) taking person names as a concrete example. Moreover, we also point to societal benefits of applying such semantic enrichment methods in order to open and connect our resources to various services.
The UAIC-RoDia-DepTb is a balanced treebank, containing texts in non-standard language: 2,575 chats sentences, old Romanian texts (a Gospel printed in 1648, a codex of laws printed in 1818, a novel written in 1910), regional popular poetry, legal texts, Romanian and foreign fiction, quotations. The proportions are comparable; each of these types of texts is represented by subsets of at least 1,000 phrases, so that the parser can be trained on their peculiarities. The annotation of the treebank started in 2007, and it has classical tags, such as those in school grammar, with the intention of using the resource for didactic purposes. The classification of circumstantial modifiers is rich in semantic information. We present in this paper the development in progress of this resource which has been automatically annotated and entirely manually corrected. We try to add new texts, and to make it available in more formats, by keeping all the morphological and syntactic information annotated, and adding logical-semantic information. We will describe here two conversions, from the classic syntactic format into Universal Dependencies format and into a logical-semantic layer, which will be shortly presented.
When people or organizations provide information, they make choices regarding what information they include and how they present it. The combination of these two aspects (the content and stance provided by the source) represents a perspective. Investigating differences in perspective can provide various useful insights in the reliability of information, the way perspectives change over time, shared beliefs among groups of a similar social or political background and contrasts between other groups, etc. This paper introduces GRaSP, a generic framework for modeling perspectives and their sources.
The paper presents part of an ongoing project of the Laboratory for Language Technologies of New Bulgarian University – “An e-Platform for Language Teaching (PLT)” – the development of corpus-based teaching content for Business English courses. The presentation offers information on: 1/ corpus creation and corpus management with PLT; 2/ PLT corpus annotation; 3/ language task generation and the Language Task Bank (LTB); 4/ content transfer to the NBU Moodle platform, test generation and feedback on student performance.
Machine Learning Models of Universal Grammar Parameter Dependencies
Dimitar Kazakov | Guido Cordoni | Andrea Ceolin | Monica-Alexandrina Irimia | Shin-Sook Kim | Dimitris Michelioudakis | Nina Radkevich | Cristina Guardiano | Giuseppe Longobardi
The use of parameters in the description of natural language syntax has to balance between the need to discriminate among (sometimes subtly different) languages, which can be seen as a cross-linguistic version of Chomsky’s (1964) descriptive adequacy, and the complexity of the acquisition task that a large number of parameters would imply, which is a problem for explanatory adequacy. Here we present a novel approach in which a machine learning algorithm is used to find dependencies in a table of parameters. The result is a dependency graph in which some of the parameters can be fully predicted from others. These empirical findings can be then subjected to linguistic analysis, which may either refute them by providing typological counter-examples of languages not included in the original dataset, dismiss them on theoretical grounds, or uphold them as tentative empirical laws worth of further study.