2016
pdf
abs
A Preliminary Study of Statistically Predictive Syntactic Complexity Features and Manual Simplifications in Basque
Itziar Gonzalez-Dios
|
María Jesús Aranzabe
|
Arantza Díaz de Ilarraza
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
In this paper, we present a comparative analysis of statistically predictive syntactic features of complexity and the treatment of these features by humans when simplifying texts. To that end, we have used a list of the most five statistically predictive features obtained automatically and the Corpus of Basque Simplified Texts (CBST) to analyse how the syntactic phenomena in these features have been manually simplified. Our aim is to go beyond the descriptions of operations found in the corpus and relate the multidisciplinary findings to understand text complexity from different points of view. We also present some issues that can be important when analysing linguistic complexity.
2014
pdf
bib
Making Biographical Data in Wikipedia Readable: A Pattern-based Multilingual Approach
Itziar Gonzalez-Dios
|
María Jesús Aranzabe
|
Arantza Díaz de Ilarraza
Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014)
pdf
Simple or Complex? Assessing the readability of Basque Texts
Itziar Gonzalez-Dios
|
María Jesús Aranzabe
|
Arantza Díaz de Ilarraza
|
Haritz Salaberri
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2012
pdf
Combining Rule-Based and Statistical Syntactic Analyzers
Iakes Goenaga
|
Koldobika Gojenola
|
María Jesús Aranzabe
|
Arantza Díaz de Ilarraza
|
Kepa Bengoetxea
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages
2010
pdf
abs
Building the Basque PropBank
Izaskun Aldezabal
|
María Jesús Aranzabe
|
Arantza Díaz de Ilarraza
|
Ainara Estarrona
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper presents the work that has been carried out to annotate semantic roles in the Basque Dependency Treebank (BDT). We will describe the resources we have used and the way the annotation of 100 verbs has been done. We decide to follow the model proposed in the PropBank project that has been deployed in other languages, such as Chinese, Spanish, Catalan and Russian. The resources used are: an in-house database with syntactic/semantic subcategorization frames for Basque verbs, an English-Basque verb mapping based on Levins classification and the BDT itself. Detailed guidelines for human taggers have been established as a result of this annotation process. In addition, we have characterized the information associated to the semantic tag. Besides, and based on this study, we will define semi-automatic procedures that will facilitate the task of manual annotation for the rest of the verbs of the Treebank. We have also adapted AbarHitz, a tool used in the construction of the BDT, for the task of annotating semantic roles according to the proposed characterization.