Yiyi Wang


2022

pdf
CML: A Contrastive Meta Learning Method to Estimate Human Label Confidence Scores and Reduce Data Collection Cost
Bo Dong | Yiyi Wang | Hanbo Sun | Yunji Wang | Alireza Hashemi | Zheng Du
Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5)

Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.

2020

pdf
Automated Scoring of Clinical Expressive Language Evaluation Tasks
Yiyi Wang | Emily Prud’hommeaux | Meysam Asgari | Jill Dolata
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

Many clinical assessment instruments used to diagnose language impairments in children include a task in which the subject must formulate a sentence to describe an image using a specific target word. Because producing sentences in this way requires the speaker to integrate syntactic and semantic knowledge in a complex manner, responses are typically evaluated on several different dimensions of appropriateness yielding a single composite score for each response. In this paper, we present a dataset consisting of non-clinically elicited responses for three related sentence formulation tasks, and we propose an approach for automatically evaluating their appropriateness. We use neural machine translation to generate correct-incorrect sentence pairs in order to create synthetic data to increase the amount and diversity of training data for our scoring model. Our scoring model uses transfer learning to facilitate automatic sentence appropriateness evaluation. We further compare custom word embeddings with pre-trained contextualized embeddings serving as features for our scoring model. We find that transfer learning improves scoring accuracy, particularly when using pretrained contextualized embeddings.

2018

pdf
A Hybrid Approach Combining Statistical Knowledge with Conditional Random Fields for Chinese Grammatical Error Detection
Yiyi Wang | Chilin Shih
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

This paper presents a method of combining Conditional Random Fields (CRFs) model with a post-processing layer using Google n-grams statistical information tailored to detect word selection and word order errors made by learners of Chinese as Foreign Language (CFL). We describe the architecture of the model and its performance in the shared task of the ACL 2018 Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA). This hybrid approach yields comparably high false positive rate (FPR = 0.1274) and precision (Pd= 0.7519; Pi= 0.6311), but low recall (Rd = 0.3035; Ri = 0.1696 ) in grammatical error detection and identification tasks. Additional statistical information and linguistic rules can be added to enhance the model performance in the future.