Representation Learning with Conditional Information Flow Maximization

Dou Hu, Lingwei Wei, Wei Zhou, Songlin Hu


Abstract
This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) for the target task. Firstly, an information flow maximization principle is proposed to learn more sufficient representations for the input and target by simultaneously maximizing both input-representation and representation-label mutual information. Unlike the information bottleneck, we handle the input-representation information in an opposite way to avoid the over-compression issue of latent representations. Besides, to mitigate the negative effect of potential redundant features from the input, we design a conditional information minimization principle to eliminate negative redundant features while preserve noise-invariant features. Experiments on 13 language understanding benchmarks demonstrate that our method effectively improves the performance of PLMs for classification and regression. Extensive experiments show that the learned representations are more sufficient, robust and transferable.
Anthology ID:
2024.acl-long.759
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14088–14103
Language:
URL:
https://aclanthology.org/2024.acl-long.759
DOI:
Bibkey:
Cite (ACL):
Dou Hu, Lingwei Wei, Wei Zhou, and Songlin Hu. 2024. Representation Learning with Conditional Information Flow Maximization. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14088–14103, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Representation Learning with Conditional Information Flow Maximization (Hu et al., ACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.acl-long.759.pdf