Peng Wang


End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding
Mengze Li | Tianbao Wang | Haoyu Zhang | Shengyu Zhang | Zhou Zhao | Jiaxu Miao | Wenqiao Zhang | Wenming Tan | Jin Wang | Peng Wang | Shiliang Pu | Fei Wu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Natural language spatial video grounding aims to detect the relevant objects in video frames with descriptive sentences as the query. In spite of the great advances, most existing methods rely on dense video frame annotations, which require a tremendous amount of human effort. To achieve effective grounding under a limited annotation budget, we investigate one-shot video grounding and learn to ground natural language in all video frames with solely one frame labeled, in an end-to-end manner. One major challenge of end-to-end one-shot video grounding is the existence of videos frames that are either irrelevant to the language query or the labeled frame. Another challenge relates to the limited supervision, which might result in ineffective representation learning. To address these challenges, we designed an end-to-end model via Information Tree for One-Shot video grounding (IT-OS). Its key module, the information tree, can eliminate the interference of irrelevant frames based on branch search and branch cropping techniques. In addition, several self-supervised tasks are proposed based on the information tree to improve the representation learning under insufficient labeling. Experiments on the benchmark dataset demonstrate the effectiveness of our model.

PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting
Zhen Zhang | Wei Zhu | Jinfan Zhang | Peng Wang | Rize Jin | Tae-Sun Chung
Findings of the Association for Computational Linguistics: NAACL 2022

BERT and other pretrained language models (PLMs) are ubiquitous in modern NLP. Even though PLMs are the state-of-the-art (SOTA) models for almost every NLP task (CITATION), the significant latency during inference prohibits wider industrial usage. In this work, we propose Patient and Confident Early Exiting BERT (PCEE-BERT), an off-the-shelf sample-dependent early exiting method that can work with different PLMs and can also work along with popular model compression methods. With a multi-exit BERT as the backbone model, PCEE-BERT will make the early exiting decision if enough numbers (patience parameter) of consecutive intermediate layers are confident about their predictions. The entropy value measures the confidence level of an intermediate layer’s prediction. Experiments on the GLUE benchmark demonstrate that our method outperforms previous SOTA early exiting methods. Ablation studies show that: (a) our method performs consistently well on other PLMs, such as ALBERT and TinyBERT; (b) PCEE-BERT can achieve different speed-up ratios by adjusting the patience parameter and the confidence threshold. The code for PCEE-BERT can be found at

CapOnImage: Context-driven Dense-Captioning on Image
Yiqi Gao | Xinglin Hou | Yuanmeng Zhang | Tiezheng Ge | Yuning Jiang | Peng Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Existing image captioning systems are dedicated to generating narrative captions for images, which are spatially detached from theimage in presentation. However, texts can also be used as decorations on the image to highlight the key points and increase theattractiveness of images. In this work, we introduce a new taskcalled captioning on image (CapOnImage), which aims to generatedense captions at different locations of the image based on contextual information. To fully exploit the surrounding visual context togenerate the most suitable caption for each location, we propose amulti-modal pre-training model with multi-level pre-training tasksthat progressively learn the correspondence between texts and image locations from easy to difficult. Since the model may generateredundant captions for nearby locations, we further enhance thelocation embedding with neighbor locations as context. For thisnew task, we also introduce a large-scale benchmark called CapOnImage2M, which contains 2.1 million product images, each with anaverage of 4.8 spatially localized captions. Compared with other image captioning model variants, our model achieves the best resultsin both captioning accuracy and diversity aspects.


WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
Hiroaki Hayashi | Prashant Budania | Peng Wang | Chris Ackerson | Raj Neervannan | Graham Neubig
Transactions of the Association for Computational Linguistics, Volume 9

Abstract Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp,1 a large-scale dataset for multi-domain aspect- based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.

Sketch and Refine: Towards Faithful and Informative Table-to-Text Generation
Peng Wang | Junyang Lin | An Yang | Chang Zhou | Yichang Zhang | Jingren Zhou | Hongxia Yang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Hyperbolic Hierarchy-Aware Knowledge Graph Embedding for Link Prediction
Zhe Pan | Peng Wang
Findings of the Association for Computational Linguistics: EMNLP 2021

Knowledge graph embedding (KGE) using low-dimensional representations to predict missing information is widely applied in knowledge completion. Existing embedding methods are mostly built on Euclidean space, which are difficult to handle hierarchical structures. Hyperbolic embedding methods have shown the promise of high fidelity and concise representation for hierarchical data. However, the logical patterns in knowledge graphs are not considered well in these methods. To address this problem, we propose a novel KGE model with extended Poincaré Ball and polar coordinate system to capture hierarchical structures. We use the tangent space and exponential transformation to initialize and map the corresponding vectors to the Poincaré Ball in hyperbolic space. To solve the boundary conditions, the boundary is stretched and zoomed by expanding the modulus length in the Poincaré Ball. We optimize our model using polar coordinate and changing operators in the extended Poincaré Ball. Experiments achieve new state-of-the-art results on part of link prediction tasks, which demonstrates the effectiveness of our method.


Ferryman as SemEval-2020 Task 5: Optimized BERT for Detecting Counterfactuals
Weilong Chen | Yan Zhuang | Peng Wang | Feng Hong | Yan Wang | Yanru Zhang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The main purpose of this article is to state the effect of using different methods and models for counterfactual determination and detection of causal knowledge. Nowadays, counterfactual reasoning has been widely used in various fields. In the realm of natural language process(NLP), counterfactual reasoning has huge potential to improve the correctness of a sentence. In the shared Task 5 of detecting counterfactual in SemEval 2020, we pre-process the officially given dataset according to case conversion, extract stem and abbreviation replacement. We use last-5 bidirectional encoder representation from bidirectional encoder representation from transformer (BERT)and term frequency–inverse document frequency (TF-IDF) vectorizer for counterfactual detection. Meanwhile, multi-sample dropout and cross validation are used to improve versatility and prevent problems such as poor generosity caused by overfitting. Finally, our team Ferryman ranked the 8th place in the sub-task 1 of this competition.

MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets
Qi Wu | Peng Wang | Chenghao Huang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of sentiment analysis of code-mixed tweets, which is a part of the SemEval-2020 competition, we preprocess datasets by replacing emoji and deleting uncommon characters and so on, and then fine-tune the Bidirectional Encoder Representation from Transformers(BERT) to perform the best. After exhausting top3 submissions, Our team MeisterMorxrc achieves an averaged F1 score of 0.730 in this task, and and our codalab username is MeisterMorxrc

Ferryman at SemEval-2020 Task 12: BERT-Based Model with Advanced Improvement Methods for Multilingual Offensive Language Identification
Weilong Chen | Peng Wang | Jipeng Li | Yuanshuai Zheng | Yan Wang | Yanru Zhang
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Indiscriminately posting offensive remarks on social media may promote the occurrence of negative events such as violence, crime, and hatred. This paper examines different approaches and models for solving offensive tweet classification, which is a part of the OffensEval 2020 competition. The dataset is Offensive Language Identification Dataset (OLID), which draws 14,200 annotated English Tweet comments. The main challenge of data preprocessing is the unbalanced class distribution, abbreviation, and emoji. To overcome these issues, methods such as hashtag segmentation, abbreviation replacement, and emoji replacement have been adopted for data preprocessing approaches. The main task can be divided into three sub-tasks, and are solved by Term Frequency–Inverse Document Frequency(TF-IDF), Bidirectional Encoder Representation from Transformer (BERT), and Multi-dropout respectively. Meanwhile, we applied different learning rates for different languages and tasks based on BERT and non-BERTmodels in order to obtain better results. Our team Ferryman ranked the 18th, 8th, and 21st with F1-score of 0.91152 on the English Sub-task A, Sub-task B, and Sub-task C, respectively. Furthermore, our team also ranked in the top 20 on the Sub-task A of other languages.

AprilE: Attention with Pseudo Residual Connection for Knowledge Graph Embedding
Yuzhang Liu | Peng Wang | Yingtai Li | Yizhan Shao | Zhongkai Xu
Proceedings of the 28th International Conference on Computational Linguistics

Knowledge graph embedding maps entities and relations into low-dimensional vector space. However, it is still challenging for many existing methods to model diverse relational patterns, especially symmetric and antisymmetric relations. To address this issue, we propose a novel model, AprilE, which employs triple-level self-attention and pseudo residual connection to model relational patterns. The triple-level self-attention treats head entity, relation, and tail entity as a sequence and captures the dependency within a triple. At the same time the pseudo residual connection retains primitive semantic features. Furthermore, to deal with symmetric and antisymmetric relations, two schemas of score function are designed via a position-adaptive mechanism. Experimental results on public datasets demonstrate that our model can produce expressive knowledge embedding and significantly outperforms most of the state-of-the-art works.


CVTE at IJCNLP-2017 Task 1: Character Checking System for Chinese Grammatical Error Diagnosis Task
Xian Li | Peng Wang | Suixue Wang | Guanyu Jiang | Tianyuan You
Proceedings of the IJCNLP 2017, Shared Tasks

Grammatical error diagnosis is an important task in natural language processing. This paper introduces CVTE Character Checking System in the NLP-TEA-4 shared task for CGED 2017, we use Bi-LSTM to generate the probability of every character, then take two kinds of strategies to decide whether a character is correct or not. This system is probably more suitable to deal with the error type of bad word selection, which is one of four types of errors, and the rest are words re-dundancy, words missing and words disorder. Finally the second strategy achieves better F1 score than the first one at all of detection level, identification level, position level.


Semantic Clustering and Convolutional Neural Network for Short Text Categorization
Peng Wang | Jiaming Xu | Bo Xu | Chenglin Liu | Heng Zhang | Fangyuan Wang | Hongwei Hao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Short Text Clustering via Convolutional Neural Networks
Jiaming Xu | Peng Wang | Guanhua Tian | Bo Xu | Jun Zhao | Fangyuan Wang | Hongwei Hao
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing