Li Ma


2021

Scientific knowledge is evolving at an unprecedented rate of speed, with new concepts constantly being introduced from millions of academic articles published every month. In this paper, we introduce a self-supervised end-to-end system, SciConceptMiner, for the automatic capture of emerging scientific concepts from both independent knowledge sources (semi-structured data) and academic publications (unstructured documents). First, we adopt a BERT-based sequence labeling model to predict candidate concept phrases with self-supervision data. Then, we incorporate rich Web content for synonym detection and concept selection via a web search API. This two-stage approach achieves highly accurate (94.7%) concept identification with more than 740K scientific concepts. These concepts are deployed in the Microsoft Academic production system and are the backbone for its semantic search capability.

2019

This paper describes the Kingsoft AI Lab’s submission to the WMT2019 news translation shared task. We participated in two language directions: English-Chinese and Chinese-English. For both language directions, we trained several variants of Transformer models using the provided parallel data enlarged with a large quantity of back-translated monolingual data. The best translation result was obtained with ensemble and reranking techniques. According to automatic metrics (BLEU) our Chinese-English system reached the second highest score, and our English-Chinese system reached the second highest score for this subtask.