Exploring and Visualizing Variation in Language Resources

Peter Fankhauser, Jörg Knappen, Elke Teich


Abstract
Language resources are often compiled for the purpose of variational analysis, such as studying differences between genres, registers, and disciplines, regional and diachronic variation, influence of gender, cultural context, etc. Often the sheer number of potentially interesting contrastive pairs can get overwhelming due to the combinatorial explosion of possible combinations. In this paper, we present an approach that combines well understood techniques for visualization heatmaps and word clouds with intuitive paradigms for exploration drill down and side by side comparison to facilitate the analysis of language variation in such highly combinatorial situations. Heatmaps assist in analyzing the overall pattern of variation in a corpus, and word clouds allow for inspecting variation at the level of words.
Anthology ID:
L14-1191
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4125–4128
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/185_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Peter Fankhauser, Jörg Knappen, and Elke Teich. 2014. Exploring and Visualizing Variation in Language Resources. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 4125–4128, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Exploring and Visualizing Variation in Language Resources (Fankhauser et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/185_Paper.pdf