Ravi Kumar


2022

pdf
Large-Scale Differentially Private BERT
Rohan Anil | Badih Ghazi | Vineet Gupta | Ravi Kumar | Pasin Manurangsi
Findings of the Association for Computational Linguistics: EMNLP 2022

In this work, we study the large-scale pretraining of BERT-Large (Devlin et al., 2019) with differentially private SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch size to millions (i.e., mega-batches) improves the utility of the DP-SGD step for BERT; we also enhance the training efficiency by using an increasing batch size schedule. Our implementation builds on the recent work of Subramani et al (2020), who demonstrated that the overhead of a DP-SGD step is minimized with effective use of JAX (Bradbury et al., 2018; Frostig et al., 2018) primitives in conjunction with the XLA compiler (XLA team and collaborators, 2017). Our implementation achieves a masked language model accuracy of 60.5% at a batch size of 2M, for epsilon=5, which is a reasonable privacy setting. To put this number in perspective, non-private BERT models achieve an accuracy of ∼70%.

2016

pdf
Conversational Flow in Oxford-style Debates
Justine Zhang | Ravi Kumar | Sujith Ravi | Cristian Danescu-Niculescu-Mizil
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2013

pdf
Summarization Through Submodularity and Dispersion
Anirban Dasgupta | Ravi Kumar | Sujith Ravi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf
Search in the Lost Sense of “Query”: Question Formulation in Web Search Queries and its Temporal Changes
Bo Pang | Ravi Kumar
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf
Matching Reviews to Objects using a Language Model
Nilesh Dalvi | Ravi Kumar | Bo Pang | Andrew Tomkins
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
For a few dollars less: Identifying review pages sans human labels
Luciano Barbosa | Ravi Kumar | Bo Pang | Andrew Tomkins
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics