Thomas Mayer


2014

pdf
Creating a massively parallel Bible corpus
Thomas Mayer | Michael Cysouw
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present our ongoing effort to create a massively parallel Bible corpus. While an ever-increasing number of Bible translations is available in electronic form on the internet, there is no large-scale parallel Bible corpus that allows language researchers to easily get access to the texts and their parallel structure for a large variety of different languages. We report on the current status of the corpus, with over 900 translations in more than 830 language varieties. All translations are tokenized (e.g., separating punctuation marks) and Unicode normalized. Mainly due to copyright restrictions only portions of the texts are made publicly available. However, we provide co-occurrence information for each translation in a (sparse) matrix format. All word forms in the translation are given together with their frequency and the verses in which they occur.

2013

pdf
PhonMatrix: Visualizing co-occurrence constraints of sounds
Thomas Mayer | Christian Rohrdantz
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf bib
Introduction
Miriam Butt | Jelena Prokić | Thomas Mayer | Michael Cysouw
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

pdf
Language comparison through sparse multilingual word alignment
Thomas Mayer | Michael Cysouw
Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

2011

pdf
Towards Tracking Semantic Change by Visual Analytics
Christian Rohrdantz | Annette Hautli | Thomas Mayer | Miriam Butt | Daniel A. Keim | Frans Plank
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf
Consonant Co-Occurrence in Stems across Languages: Automatic Analysis and Visualization of a Phonotactic Constraint
Thomas Mayer | Christian Rohrdantz | Frans Plank | Peter Bak | Miriam Butt | Daniel A. Keim
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

pdf
Toward a Totally Unsupervised, Language-Independent Method for the Syllabification of Written Texts
Thomas Mayer
Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology