Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics
Abstract
Like many other research fields, linguistics is entering the age of big data. We are now at a point where it is possible to see how new research questions can be formulated - and old research questions addressed from a new angle or established results verified - on the basis of exhaustive collections of data, rather than small, carefully selected samples. For example, South Asia is often mentioned in the literature as a classic example of a linguistic area, but there is no systematic, empirical study substantiating this claim. Examination of genealogical and areal relationships among South Asian languages requires a large-scale quantitative and qualitative comparative study, encompassing more than one language family. Further, such a study cannot be conducted manually, but needs to draw on extensive digitized language resources and state-of-the-art computational tools. We present some preliminary results of our large-scale investigation of the genealogical and areal relationships among the languages of this region, based on the linguistic descriptions available in the 19 tomes of Grierson’s monumental “Linguistic Survey of India” (1903-1927), which is currently being digitized with the aim of turning the linguistic information in the LSI into a digital language resource suitable for a broad array of linguistic investigations.- Anthology ID:
- L14-1175
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3137–3144
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/159_Paper.pdf
- DOI:
- Cite (ACL):
- Lars Borin, Anju Saxena, Taraka Rama, and Bernard Comrie. 2014. Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3137–3144, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics (Borin et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/159_Paper.pdf