Xiangyu Yang


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation
Jingshen Zhang | Xiangyu Yang | Xinkai Su | Xinglu Chen | Tianyou Huang | Xinying Qiu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence. For Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pretraining and used an NSP-based strategy with Symmetric Cross Entropy loss to capture context and mitigate long dependencies. Our methods effectively address key challenges in Chinese Essay Fluency Evaluation.”

2020

pdf bib
imec-ETRO-VUB at W-NUT 2020 Shared Task-3: A multilabel BERT-based system for predicting COVID-19 events
Xiangyu Yang | Giannis Bekoulis | Nikos Deligiannis
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

In this paper, we present our system designed to address the W-NUT 2020 shared task for COVID-19 Event Extraction from Twitter. To mitigate the noisy nature of the Twitter stream, our system makes use of the COVID-Twitter-BERT (CT-BERT), which is a language model pre-trained on a large corpus of COVID-19 related Twitter messages. Our system is trained on the COVID-19 Twitter Event Corpus and is able to identify relevant text spans that answer pre-defined questions (i.e., slot types) for five COVID-19 related events (i.e., TESTED POSITIVE, TESTED NEGATIVE, CAN-NOT-TEST, DEATH and CURE & PREVENTION). We have experimented with different architectures; our best performing model relies on a multilabel classifier on top of the CT-BERT model that jointly trains all the slot types for a single event. Our experimental results indicate that our Multilabel-CT-BERT system outperforms the baseline methods by 7 percentage points in terms of micro average F1 score. Our model ranked as 4th in the shared task leaderboard.