4FX: Light Verb Constructions in a Multilingual Parallel Corpus

Anita Rácz, István Nagy T., Veronika Vincze


Abstract
In this paper, we describe 4FX, a quadrilingual (English-Spanish-German-Hungarian) parallel corpus annotated for light verb constructions. We present the annotation process, and report statistical data on the frequency of LVCs in each language. We also offer inter-annotator agreement rates and we highlight some interesting facts and tendencies on the basis of comparing multilingual data from the four corpora. According to the frequency of LVC categories and the calculated Kendall’s coefficient for the four corpora, we found that Spanish and German are very similar to each other, Hungarian is also similar to both, but German differs from all these three. The qualitative and quantitative data analysis might prove useful in theoretical linguistic research for all the four languages. Moreover, the corpus will be an excellent testbed for the development and evaluation of machine learning based methods aiming at extracting or identifying light verb constructions in these four languages.
Anthology ID:
L14-1293
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
710–715
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/331_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Anita Rácz, István Nagy T., and Veronika Vincze. 2014. 4FX: Light Verb Constructions in a Multilingual Parallel Corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 710–715, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
4FX: Light Verb Constructions in a Multilingual Parallel Corpus (Rácz et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/331_Paper.pdf