Decomposing Mutual Information for Representation Learning

user-5d8054e8530c708f9920ccce(2021)

引用 0|浏览39
暂无评分
摘要
Many self-supervised representation learning methods maximize mutual information (MI) across views. In this paper, we transform each view into a set of subviews and then decompose the original MI bound into a sum of bounds involving conditional MI between the subviews. E.g.,~given two views x and y of the same input example, we can split x into two subviews, x′ and x′′, which depend only on x but are otherwise unconstrained. The following holds: I(x;y)≥I(x′′;y)+I(x′;y|x′′), due to the chain rule and information processing inequality. By maximizing both terms in the decomposition, our approach explicitly rewards the encoder for any information about y which it extracts from x′′, and for information about y extracted from x′ in excess of the information from x′′. We provide a novel contrastive lower-bound on conditional MI, that relies on sampling contrast sets from p(y|x′′). By decomposing the original MI into a sum of increasingly challenging MI bounds between sets of increasingly informed views, our representations can capture more of the total information shared between the original views. We empirically test the method in a vision domain and for dialogue generation.
更多
查看译文
关键词
Mutual information,Information processing,Feature learning,Chain rule,Encoder,Theoretical computer science,Sampling (statistics),Computer science,Self supervised learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要