Sheng-Fu Wang


2020

pdf
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
Alex Warstadt | Alicia Parrish | Haokun Liu | Anhad Mohananey | Wei Peng | Sheng-Fu Wang | Samuel R. Bowman
Transactions of the Association for Computational Linguistics, Volume 8

We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP),1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands.

pdf
BLiMP: A Benchmark of Linguistic Minimal Pairs for English
Alex Warstadt | Alicia Parrish | Haokun Liu | Anhad Mohananey | Wei Peng | Sheng-Fu Wang | Samuel R. Bowman
Proceedings of the Society for Computation in Linguistics 2020

2019

pdf
Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs
Alex Warstadt | Yu Cao | Ioana Grosu | Wei Peng | Hagen Blix | Yining Nie | Anna Alsop | Shikha Bordia | Haokun Liu | Alicia Parrish | Sheng-Fu Wang | Jason Phang | Anhad Mohananey | Phu Mon Htut | Paloma Jeretic | Samuel R. Bowman
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Though state-of-the-art sentence representation models can perform tasks requiring significant knowledge of grammar, it is an open question how best to evaluate their grammatical knowledge. We explore five experimental methods inspired by prior work evaluating pretrained sentence representation models. We use a single linguistic phenomenon, negative polarity item (NPI) licensing, as a case study for our experiments. NPIs like any are grammatical only if they appear in a licensing environment like negation (Sue doesn’t have any cats vs. *Sue has any cats). This phenomenon is challenging because of the variety of NPI licensing environments that exist. We introduce an artificially generated dataset that manipulates key features of NPI licensing for the experiments. We find that BERT has significant knowledge of these features, but its success varies widely across different experimental methods. We conclude that a variety of methods is necessary to reveal all relevant aspects of a model’s grammatical knowledge in a given domain.

pdf
The organization of sound inventories: A study on obstruent gaps
Sheng-Fu Wang
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

2018

pdf
The Lifted Matrix-Space Model for Semantic Composition
WooJin Chung | Sheng-Fu Wang | Samuel Bowman
Proceedings of the 22nd Conference on Computational Natural Language Learning

Tree-structured neural network architectures for sentence encoding draw inspiration from the approach to semantic composition generally seen in formal linguistics, and have shown empirical improvements over comparable sequence models by doing so. Moreover, adding multiplicative interaction terms to the composition functions in these models can yield significant further improvements. However, existing compositional approaches that adopt such a powerful composition function scale poorly, with parameter counts exploding as model dimension or vocabulary size grows. We introduce the Lifted Matrix-Space model, which uses a global transformation to map vector word embeddings to matrices, which can then be composed via an operation based on matrix-matrix multiplication. Its composition function effectively transmits a larger number of activations across layers with relatively few model parameters. We evaluate our model on the Stanford NLI corpus, the Multi-Genre NLI corpus, and the Stanford Sentiment Treebank and find that it consistently outperforms TreeLSTM (Tai et al., 2015), the previous best known composition function for tree-structured models.

2012

pdf
Frequency, Collocation, and Statistical Modeling of Lexical Items: A Case Study of Temporal Expressions in Two Conversational Corpora
Sheng-Fu Wang | Jing-Chen Yang | Yu-Yun Chang | Yu-Wen Liu | Shu-Kai Hsieh
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 2, June 2012—Special Issue on Selected Papers from ROCLING XXIII

2011

pdf
Frequency, Collocation, and Statistical Modeling of Lexical Items: A Case Study of Temporal Expressions in an Elderly Speaker Corpus
Sheng-Fu Wang | Jing-Chen Yang | Yu-Yun Chang | Yu-Wen Liu | Shu-Kai Hsieh
Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing (ROCLING 2011)