Long Ma


2023

pdf
PAI at SemEval-2023 Task 4: A General Multi-label Classification System with Class-balanced Loss Function and Ensemble Module
Long Ma | Zeye Sun | Jiawei Jiang | Xuan Li
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The Human Value Detection shared task\cite{kiesel:2023} aims to classify whether or not the argument draws on a set of 20 value categories, given a textual argument. This is a difficult task as the discrimination of human values behind arguments is often implicit. Moreover, the number of label categories can be up to 20 and the distribution of data is highly imbalanced. To address these issues, we employ a multi-label classification model and utilize a class-balanced loss function. Our system wins 5 first places, 2 second places, and 6 third places out of 20 categories of the Human Value Detection shared task, and our overall average score of 0.54 also places third. The code is publicly available at \url{https://www.github.com/diqiuzhuanzhuan/semeval2023}.

pdf
PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information
Long Ma | Kai Lu | Tianbo Che | Hailong Huang | Weiguo Gao | Xuan Li
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The MultiCoNER II task aims to detect complex, ambiguous, and fine-grained named entities in low-context situations and noisy scenarios like the presence of spelling mistakes and typos for multiple languages. The task poses significant challenges due to the scarcity of contextual information, the high granularity of the entities(up to 33 classes), and the interference of noisy data. To address these issues, our team PAI proposes a universal Named Entity Recognition (NER) system that integrates external entity information to improve performance. Specifically, our system retrieves entities with properties from the knowledge base (i.e. Wikipedia) for a given text, then concatenates entity information with the input sentence and feeds it into Transformer-based models. Finally, our system wins 2 first places, 4 second places, and 1 third place out of 13 tracks. The code is publicly available at https://github.com/diqiuzhuanzhuan/semeval-2023.

2022

pdf
PAI at SemEval-2022 Task 11: Name Entity Recognition with Contextualized Entity Representations and Robust Loss Functions
Long Ma | Xiaorong Jian | Xuan Li
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes our system used in the SemEval-2022 Task 11 Multilingual Complex Named Entity Recognition, achieving 3rd for track 1 on the leaderboard. We propose Dictionary-fused BERT, a flexible approach for entity dictionaries integration. The main ideas of our systems are:1) integrating external knowledge (an entity dictionary) into pre-trained models to obtain contextualized word and entity representations 2) designing a robust loss function leveraging a logit matrix 3) adding an auxiliary task, which is an on-top binary classification to decide whether the token is a mention word or not, makes the main task easier to learn. It is worth noting that our system achieves an F1 of 0.914 in the post-evaluation stage by updating the entity dictionary to the one of (CITATION), which is higher than the score of 1st on the leaderboard of the evaluation stage.