Code-mixed parse trees and how to find them

Anirudh Srinivasan, Sandipan Dandapat, Monojit Choudhury


Abstract
In this paper, we explore the methods of obtaining parse trees of code-mixed sentences and analyse the obtained trees. Existing work has shown that linguistic theories can be used to generate code-mixed sentences from a set of parallel sentences. We build upon this work, using one of these theories, the Equivalence-Constraint theory to obtain the parse trees of synthetically generated code-mixed sentences and evaluate them with a neural constituency parser. We highlight the lack of a dataset non-synthetic code-mixed constituency parse trees and how it makes our evaluation difficult. To complete our evaluation, we convert a code-mixed dependency parse tree set into “pseudo constituency trees” and find that a parser trained on synthetically generated trees is able to decently parse these as well.
Anthology ID:
2020.calcs-1.8
Volume:
Proceedings of the The 4th Workshop on Computational Approaches to Code Switching
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
CALCS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
57–64
Language:
English
URL:
https://aclanthology.org/2020.calcs-1.8
DOI:
Bibkey:
Cite (ACL):
Anirudh Srinivasan, Sandipan Dandapat, and Monojit Choudhury. 2020. Code-mixed parse trees and how to find them. In Proceedings of the The 4th Workshop on Computational Approaches to Code Switching, pages 57–64, Marseille, France. European Language Resources Association.
Cite (Informal):
Code-mixed parse trees and how to find them (Srinivasan et al., CALCS 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.calcs-1.8.pdf