Abstract
Though remarkable efforts have been made in non-parallel text style transfer, the evaluation system is unsatisfactory. It always evaluates over samples from only one checkpoint of the model and compares three metrics, i.e., transfer accuracy, BLEU score, and PPL score. In this paper, we argue the inappropriateness of both existing evaluation metrics and the evaluation method. Specifically, for evaluation metrics, we make a detailed analysis and comparison from three aspects: style transfer, content preservation, and naturalness; for the evaluation method, we reiterate the fallacy of picking one checkpoint for model comparison. As a result, we establish a robust evaluation method by examining the trade-off between style transfer and naturalness, and between content preservation and naturalness. Notably, we elaborate the human evaluation and automatically identify the inaccurate measurement of content preservation computed by the BLEU score. To overcome this issue, we propose a graph-based method to extract attribute content and attribute-independent content from input sentences in the YELP dataset and IMDB dataset. With the modified datasets, we design a new evaluation metric called “attribute hit” and propose an efficient regularization to leverage the attribute-dependent content and attribute-independent content as guiding signals. Experimental results have demonstrated the effectiveness of the proposed strategy.- Anthology ID:
- 2021.findings-emnlp.135
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2021
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
- Venue:
- Findings
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1569–1582
- Language:
- URL:
- https://aclanthology.org/2021.findings-emnlp.135
- DOI:
- 10.18653/v1/2021.findings-emnlp.135
- Cite (ACL):
- Ping Yu, Yang Zhao, Chunyuan Li, and Changyou Chen. 2021. Rethinking Sentiment Style Transfer. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1569–1582, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Rethinking Sentiment Style Transfer (Yu et al., Findings 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.findings-emnlp.135.pdf