Jiarong Xu
2023
Understanding Translationese in Cross-Lingual Summarization
Jiaan Wang
|
Fandong Meng
|
Yunlong Liang
|
Tingyi Zhang
|
Jiarong Xu
|
Zhixu Li
|
Jie Zhou
Findings of the Association for Computational Linguistics: EMNLP 2023
Given a document in a source language, cross-lingual summarization (CLS) aims at generating a concise summary in a different target language. Unlike monolingual summarization (MS), naturally occurring source-language documents paired with target-language summaries are rare. To collect large-scale CLS data, existing datasets typically involve translation in their creation. However, the translated text is distinguished from the text originally written in that language, i.e., translationese. In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese. Then we systematically investigate how translationese affects CLS model evaluation and performance when it appears in source documents or target summaries. In detail, we find that (1) the translationese in documents or summaries of test sets might lead to the discrepancy between human judgment and automatic evaluation; (2) the translationese in training sets would harm model performance in real-world applications; (3) though machine-translated documents involve translationese, they are very useful for building CLS systems on low-resource languages under specific training strategies. Lastly, we give suggestions for future CLS research including dataset and model developments. We hope that our work could let researchers notice the phenomenon of translationese in CLS and take it into account in the future.
Unleashing the Power of Language Models in Text-Attributed Graph
Haoyu Kuang
|
Jiarong Xu
|
Haozhe Zhang
|
Zuyu Zhao
|
Qi Zhang
|
Xuanjing Huang
|
Zhongyu Wei
Findings of the Association for Computational Linguistics: EMNLP 2023
Representation learning on graph has been demonstrated to be a powerful tool for solving real-world problems. Text-attributed graph carries both semantic and structural information among different types of graphs. Existing works have paved the way for knowledge extraction of this type of data by leveraging language models or graph neural networks or combination of them. However, these works suffer from issues like underutilization of relationships between nodes or words or unaffordable memory cost. In this paper, we propose a Node Representation Update Pre-training Architecture based on Co-modeling Text and Graph (NRUP). In NRUP, we construct a hierarchical text-attributed graph that incorporates both original nodes and word nodes. Meanwhile, we apply four self-supervised tasks for different level of constructed graph. We further design the pre-training framework to update the features of nodes during training epochs. We conduct the experiment on the benchmark dataset ogbn-arxiv. Our method achieves outperformance compared to baselines, fully demonstrating its validity and generalization.
One-Model-Connects-All: A Unified Graph Pre-Training Model for Online Community Modeling
Ruoxue Ma
|
Jiarong Xu
|
Xinnong Zhang
|
Haozhe Zhang
|
Zuyu Zhao
|
Qi Zhang
|
Xuanjing Huang
|
Zhongyu Wei
Findings of the Association for Computational Linguistics: EMNLP 2023
Online community is composed of communities, users, and user-generated textual content, with rich information that can help us solve social problems. Previous research hasn’t fully utilized these three components and the relationship among them. What’s more, they can’t adapt to a wide range of downstream tasks. To solve these problems, we focus on a framework that simultaneously considers communities, users, and texts. And it can easily connect with a variety of downstream tasks related to social media. Specifically, we use a ternary heterogeneous graph to model online communities. Text reconstruction and edge generation are used to learn structural and semantic knowledge among communities, users, and texts. By leveraging this pre-trained model, we achieve promising results across multiple downstream tasks, such as violation detection, sentiment analysis, and community recommendation. Our exploration will improve online community modeling.
Search
Co-authors
- Haozhe Zhang 2
- Zuyu Zhao 2
- Qi Zhang 2
- Xuan-Jing Huang 2
- Zhongyu Wei 2
- show all...