Code-mixed parse trees and how to find them

Anirudh Srinivasan, Sandipan Dandapat, Monojit Choudhury


Abstract
In this paper, we explore the methods of obtaining parse trees of code-mixed sentences and analyse the obtained trees. Existing work has shown that linguistic theories can be used to generate code-mixed sentences from a set of parallel sentences. We build upon this work, using one of these theories, the Equivalence-Constraint theory to obtain the parse trees of synthetically generated code-mixed sentences and evaluate them with a neural constituency parser. We highlight the lack of a dataset non-synthetic code-mixed constituency parse trees and how it makes our evaluation difficult. To complete our evaluation, we convert a code-mixed dependency parse tree set into “pseudo constituency trees” and find that a parser trained on synthetically generated trees is able to decently parse these as well.
Anthology ID:
2020.calcs-1.8
Volume:
Proceedings of the 4th Workshop on Computational Approaches to Code Switching
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Thamar Solorio, Monojit Choudhury, Kalika Bali, Sunayana Sitaram, Amitava Das, Mona Diab
Venue:
CALCS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
57–64
Language:
English
URL:
https://aclanthology.org/2020.calcs-1.8
DOI:
Bibkey:
Cite (ACL):
Anirudh Srinivasan, Sandipan Dandapat, and Monojit Choudhury. 2020. Code-mixed parse trees and how to find them. In Proceedings of the 4th Workshop on Computational Approaches to Code Switching, pages 57–64, Marseille, France. European Language Resources Association.
Cite (Informal):
Code-mixed parse trees and how to find them (Srinivasan et al., CALCS 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.calcs-1.8.pdf