Abstract
We focus on systems for TASK1 (TASK 1A and TASK 1B) of CL-SciSumm Shared Task 2020 in this paper. Task 1A is regarded as a binary classification task of sentence pairs. The strategies of domain-specific embedding and special tokens based on language models are proposed. Fusion of contextualized embedding and extra information is further explored in this article. We leverage Sembert to capture the structured semantic information. The joint of BERT-based model and classifiers without neural networks is also exploited. For the Task 1B, a language model with different weights for classes is fine-tuned to accomplish a multi-label classification task. The results show that extra information can improve the identification of cited text spans. The end-to-end trained models outperform models trained with two stages, and the averaged prediction of multi-models is more accurate than an individual one.- Anthology ID:
- 2020.sdp-1.26
- Volume:
- Proceedings of the First Workshop on Scholarly Document Processing
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer
- Venue:
- sdp
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 235–241
- Language:
- URL:
- https://aclanthology.org/2020.sdp-1.26
- DOI:
- 10.18653/v1/2020.sdp-1.26
- Cite (ACL):
- Ling Chai, Guizhen Fu, and Yuan Ni. 2020. NLP-PINGAN-TECH @ CL-SciSumm 2020. In Proceedings of the First Workshop on Scholarly Document Processing, pages 235–241, Online. Association for Computational Linguistics.
- Cite (Informal):
- NLP-PINGAN-TECH @ CL-SciSumm 2020 (Chai et al., sdp 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2020.sdp-1.26.pdf