Jianyu Liu


2024

pdf
PRIMO: Progressive Induction for Multi-hop Open Rule Generation
Jianyu Liu | Sheng Bi | Guilin Qi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Open rules refer to the implication from premise atoms to hypothesis atoms, which captures various relationships between instances in the real world. Injecting open rule knowledge into the machine helps to improve the performance of downstream tasks such as dialogue and relation extraction. Existing approaches focus on single-hop open rule generation, ignoring scenarios involving multiple hops, leading to logical inconsistencies between premise and hypothesis atoms, as well as semantic duplication of generated rule atoms. To address these issues, we propose a progressive multi-stage open rule generation method called PRIMO. We introduce ontology information during the rule generation stage to reduce ambiguity and improve rule accuracy. PRIMO constructs a multi-stage structure consisting of generation, extraction, and rank modules to fully leverage the latent knowledge within the language model across multiple dimensions. Furthermore, we employ reinforcement learning from human feedback to further optimize model, enhancing the model’s understanding of commonsense knowledge. Experimental results demonstrate that compared to baseline models, PRIMO significantly enhances rule quality and diversity while reducing the repetition rate of rule atoms.

2022

pdf
SPDB Innovation Lab at SemEval-2022 Task 3: Recognize Appropriate Taxonomic Relations Between Two Nominal Arguments with ERNIE-M Model
Yue Zhou | Bowei Wei | Jianyu Liu | Yang Yang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Synonym and antonym practice are the most common practices in our early childhood. It correlated our known words to a better place deep in our intuition. At the beginning of life for a machine, we would like to treat the machine as a baby and built a similar training for it as well to present a qualified performance. In this paper, we present an ensemble model for sentence logistics classification, which outperforms the state-of-art methods. Our approach essentially builds on two models including ERNIE-M and DeBERTaV3. With cross validation and random seeds tuning, we select the top performance models for the last soft ensemble and make them vote for the final answer, achieving the top 6 performance.

2021

pdf
A Web Scale Entity Extraction System
Xuanting Cai | Quanbin Ma | Jianyu Liu | Pan Li | Qi Zeng | Zhengkan Yang | Pushkar Tripathi
Findings of the Association for Computational Linguistics: EMNLP 2021

Understanding the semantic meaning of content on the web through the lens of entities and concepts has many practical advantages. However, when building large-scale entity extraction systems, practitioners are facing unique challenges involving finding the best ways to leverage the scale and variety of data available on internet platforms. We present learnings from our efforts in building an entity extraction system for multiple document types at large scale using multi-modal Transformers. We empirically demonstrate the effectiveness of multi-lingual, multi-task and cross-document type learning. We also discuss the label collection schemes that help to minimize the amount of noise in the collected data.