Samopriya Basu


2022

pdf
Computational Historical Linguistics and Language Diversity in South Asia
Aryaman Arora | Adam Farris | Samopriya Basu | Suresh Kolichala
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

South Asia is home to a plethora of languages, many of which severely lack access to new language technologies. This linguistic diversity also results in a research environment conducive to the study of comparative, contact, and historical linguistics–fields which necessitate the gathering of extensive data from many languages. We claim that data scatteredness (rather than scarcity) is the primary obstacle in the development of South Asian language technology, and suggest that the study of language history is uniquely aligned with surmounting this obstacle. We review recent developments in and at the intersection of South Asian NLP and historical-comparative linguistics, describing our and others’ current efforts in this area. We also offer new strategies towards breaking the data barrier.

2021

pdf
Bhāṣācitra: Visualising the dialect geography of South Asia
Aryaman Arora | Adam Farris | Gopalakrishnan R | Samopriya Basu
Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021

We present Bhāṣācitra, a dialect mapping system for South Asia built on a database of linguistic studies of languages of the region annotated for topic and location data. We analyse language coverage and look towards applications to typology by visualising example datasets. The application is not only meant to be useful for feature mapping, but also serves as a new kind of interactive bibliography for linguists of South Asian languages.