Xiangyu Yang
2024
Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation
Jingshen Zhang
|
Xiangyu Yang
|
Xinkai Su
|
Xinglu Chen
|
Tianyou Huang
|
Xinying Qiu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
“This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence. For Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pretraining and used an NSP-based strategy with Symmetric Cross Entropy loss to capture context and mitigate long dependencies. Our methods effectively address key challenges in Chinese Essay Fluency Evaluation.”
2020
imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A multilabel BERT-based system for predicting COVID-19 events
Xiangyu Yang
|
Giannis Bekoulis
|
Nikos Deligiannis
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
In this paper, we present our system designed to address the W-NUT 2020 shared task for COVID-19 Event Extraction from Twitter. To mitigate the noisy nature of the Twitter stream, our system makes use of the COVID-Twitter-BERT (CT-BERT), which is a language model pre-trained on a large corpus of COVID-19 related Twitter messages. Our system is trained on the COVID-19 Twitter Event Corpus and is able to identify relevant text spans that answer pre-defined questions (i.e., slot types) for five COVID-19 related events (i.e., TESTED POSITIVE, TESTED NEGATIVE, CAN-NOT-TEST, DEATH and CURE & PREVENTION). We have experimented with different architectures; our best performing model relies on a multilabel classifier on top of the CT-BERT model that jointly trains all the slot types for a single event. Our experimental results indicate that our Multilabel-CT-BERT system outperforms the baseline methods by 7 percentage points in terms of micro average F1 score. Our model ranked as 4th in the shared task leaderboard.
Search
Fix author
Co-authors
- Giannis Bekoulis 1
- Xinglu Chen 1
- Nikos Deligiannis 1
- Tianyou Huang 1
- Xin Ying Qiu 1
- show all...