2023
pdf
abs
CoLT5: Faster Long-Range Transformers with Conditional Computation
Joshua Ainslie
|
Tao Lei
|
Michiel de Jong
|
Santiago Ontanon
|
Siddhartha Brahma
|
Yury Zemlyanskiy
|
David Uthus
|
Mandy Guo
|
James Lee-Thorp
|
Yi Tay
|
Yun-Hsuan Sung
|
Sumit Sanghai
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive – not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.
2020
pdf
abs
CLAR: A Cross-Lingual Argument Regularizer for Semantic Role Labeling
Ishan Jindal
|
Yunyao Li
|
Siddhartha Brahma
|
Huaiyu Zhu
Findings of the Association for Computational Linguistics: EMNLP 2020
Semantic role labeling (SRL) identifies predicate-argument structure(s) in a given sentence. Although different languages have different argument annotations, polyglot training, the idea of training one model on multiple languages, has previously been shown to outperform monolingual baselines, especially for low resource languages. In fact, even a simple combination of data has been shown to be effective with polyglot training by representing the distant vocabularies in a shared representation space. Meanwhile, despite the dissimilarity in argument annotations between languages, certain argument labels do share common semantic meaning across languages (e.g. adjuncts have more or less similar semantic meaning across languages). To leverage such similarity in annotation space across languages, we propose a method called Cross-Lingual Argument Regularizer (CLAR). CLAR identifies such linguistic annotation similarity across languages and exploits this information to map the target language arguments using a transformation of the space on which source language arguments lie. By doing so, our experimental results show that CLAR consistently improves SRL performance on multiple languages over monolingual and polyglot baselines for low resource languages.
pdf
abs
Small but Mighty: New Benchmarks for Split and Rephrase
Li Zhang
|
Huaiyu Zhu
|
Siddhartha Brahma
|
Yunyao Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.
pdf
abs
Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification
Prithviraj Sen
|
Marina Danilevsky
|
Yunyao Li
|
Siddhartha Brahma
|
Matthias Boehm
|
Laura Chiticariu
|
Rajasekar Krishnamurthy
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Interpretability of predictive models is becoming increasingly important with growing adoption in the real-world. We present RuleNN, a neural network architecture for learning transparent models for sentence classification. The models are in the form of rules expressed in first-order logic, a dialect with well-defined, human-understandable semantics. More precisely, RuleNN learns linguistic expressions (LE) built on top of predicates extracted using shallow natural language understanding. Our experimental results show that RuleNN outperforms statistical relational learning and other neuro-symbolic methods, and performs comparably with black-box recurrent neural networks. Our user studies confirm that the learned LEs are explainable and capture domain semantics. Moreover, allowing domain experts to modify LEs and instill more domain knowledge leads to human-machine co-creation of models with better performance.
2019
pdf
abs
Improved Language Modeling by Decoding the Past
Siddhartha Brahma
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regularization method based on decoding the last token in the context using the predicted distribution of the next token. This biases the model towards retaining more contextual information, in turn improving its ability to predict the next token. With negligible overhead in the number of parameters and training time, our Past Decode Regularization (PDR) method improves perplexity on the Penn Treebank dataset by up to 1.8 points and by up to 2.3 points on the WikiText-2 dataset, over strong regularized baselines using a single softmax. With a mixture-of-softmax model, we show gains of up to 1.0 perplexity points on these datasets. In addition, our method achieves 1.169 bits-per-character on the Penn Treebank Character dataset for character level language modeling.