Yaohan He


2022

pdf
CMB AI Lab at SemEval-2022 Task 11: A Two-Stage Approach for Complex Named Entity Recognition via Span Boundary Detection and Span Classification
Keyu Pu | Hongyi Liu | Yixiao Yang | Jiangzhou Ji | Wenyi Lv | Yaohan He
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents a solution for the SemEval-2022 Task 11 Multilingual Complex Named Entity Recognition. What is challenging in this task is detecting semantically ambiguous and complex entities in short and low-context settings. Our team (CMB AI Lab) propose a two-stage method to recognize the named entities: first, a model based on biaffine layer is built to predict span boundaries, and then a span classification model based on pooling layer is built to predict semantic tags of the spans. The basic pre-trained models we choose are XLM-RoBERTa and mT5. The evaluation result of our approach achieves an F1 score of 84.62 on sub-task 13, which ranks the third on the learder board.

pdf
Leveraging Explicit Lexico-logical Alignments in Text-to-SQL Parsing
Runxin Sun | Shizhu He | Chong Zhu | Yaohan He | Jinlong Li | Jun Zhao | Kang Liu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Text-to-SQL aims to parse natural language questions into SQL queries, which is valuable in providing an easy interface to access large databases. Previous work has observed that leveraging lexico-logical alignments is very helpful to improve parsing performance. However, current attention-based approaches can only model such alignments at the token level and have unsatisfactory generalization capability. In this paper, we propose a new approach to leveraging explicit lexico-logical alignments. It first identifies possible phrase-level alignments and injects them as additional contexts to guide the parsing procedure. Experimental results on Squall show that our approach can make better use of such alignments and obtains an absolute improvement of 3.4% compared with the current state-of-the-art.