Miao Li


2023

pdf
THiFLY Research at SemEval-2023 Task 7: A Multi-granularity System for CTR-based Textual Entailment and Evidence Retrieval
Yuxuan Zhou | Ziyu Jin | Meiwei Li | Miao Li | Xien Liu | Xinxin You | Ji Wu
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The NLI4CT task aims to entail hypotheses based on Clinical Trial Reports (CTRs) and retrieve the corresponding evidence supporting the justification. This task poses a significant challenge, as verifying hypotheses in the NLI4CT task requires the integration of multiple pieces of evidence from one or two CTR(s) and the application of diverse levels of reasoning, including textual and numerical. To address these problems, we present a multi-granularity system for CTR-based textual entailment and evidence retrieval in this paper. Specifically, we construct a Multi-granularity Inference Network (MGNet) that exploits sentence-level and token-level encoding to handle both textual entailment and evidence retrieval tasks. Moreover, we enhance the numerical inference capability of the system by leveraging a T5-based model, SciFive, which is pre-trained on the medical corpus. Model ensembling and a joint inference method are further utilized in the system to increase the stability and consistency of inference. The system achieves f1-scores of 0.856 and 0.853 on textual entailment and evidence retrieval tasks, resulting in the best performance on both subtasks. The experimental results corroborate the effectiveness of our proposed method.

2019

pdf
A Topic Augmented Text Generation Model: Joint Learning of Semantics and Structural Features
Hongyin Tang | Miao Li | Beihong Jin
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Text generation is among the most fundamental tasks in natural language processing. In this paper, we propose a text generation model that learns semantics and structural features simultaneously. This model captures structural features by a sequential variational autoencoder component and leverages a topic modeling component based on Gaussian distribution to enhance the recognition of text semantics. To make the reconstructed text more coherent to the topics, the model further adapts the encoder of the topic modeling component for a discriminator. The results of experiments over several datasets demonstrate that our model outperforms several states of the art models in terms of text perplexity and topic coherence. Moreover, the latent representations learned by our model is superior to others in a text classification task. Finally, given the input texts, our model can generate meaningful texts which hold similar structures but under different topics.

2015

pdf
An combined sentiment classification system for SIGHAN-8
Qiuchi Li | Qiyu Zhi | Miao Li
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

2014

pdf
Improving Bilingual Lexicon Extraction Performance from Comparable Corpora via Optimizing Translation Candidate Lists
Shaoqi Wang | Miao Li | Zede Zhu | Zhenxin Yang | Shizhuang Weng
Proceedings of the Third CIPS-SIGHAN Joint Conference on Chinese Language Processing

2013

pdf
Building Comparable Corpora Based on Bilingual LDA Model
Zede Zhu | Miao Li | Lei Chen | Zhenxin Yang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)