Ali Dashti


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
How Far Is Too Far? Studying the Effects of Domain Discrepancy on Masked Language Models
Subhradeep Kayal | Alexander Rakhlin | Ali Dashti | Serguei Stepaniants
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Pre-trained masked language models, such as BERT, perform strongly on a wide variety of NLP tasks and have become ubiquitous in recent years. The typical way to use such models is to fine-tune them on downstream data. In this work, we aim to study how the difference in domains between the pre-trained model and the task effects its final performance. We first devise a simple mechanism to quantify the domain difference (using a cloze task) and use it to partition our dataset. Using these partitions of varying domain discrepancy, we focus on answering key questions around the impact of discrepancy on final performance, robustness to out-of-domain test-time examples and effect of domain-adaptive pre-training. We base our experiments on a large-scale openly available e-commerce dataset, and our findings suggest that in spite of pre-training the performance of BERT degrades on datasets with high domain discrepancy, especially in low resource cases. This effect is somewhat mitigated by continued pre-training for domain adaptation. Furthermore, the domain-gap also makes BERT sensitive to out-of-domain examples during inference, even in high resource tasks, and it is prudent to use as diverse a dataset as possible during fine-tuning to make it robust to domain shift.