Decomposing Mutual Information for Representation Learning

Alessandro Sordoni,Nouha Dziri,Hannes Schulz,Geoff Gordon,Remi Tachet des Combes,Philip Bachman

user-5d8054e8530c708f9920ccce（2021）

引用 0|浏览39

暂无评分

摘要

Many self-supervised representation learning methods maximize mutual information (MI) across views. In this paper, we transform each view into a set of subviews and then decompose the original MI bound into a sum of bounds involving conditional MI between the subviews. E.g.,~given two views x and y of the same input example, we can split x into two subviews, x′ and x′′, which depend only on x but are otherwise unconstrained. The following holds: I(x;y)≥I(x′′;y)+I(x′;y|x′′), due to the chain rule and information processing inequality. By maximizing both terms in the decomposition, our approach explicitly rewards the encoder for any information about y which it extracts from x′′, and for information about y extracted from x′ in excess of the information from x′′. We provide a novel contrastive lower-bound on conditional MI, that relies on sampling contrast sets from p(y|x′′). By decomposing the original MI into a sum of increasingly challenging MI bounds between sets of increasingly informed views, our representations can capture more of the total information shared between the original views. We empirically test the method in a vision domain and for dialogue generation.

查看译文

关键词

Mutual information,Information processing,Feature learning,Chain rule,Encoder,Theoretical computer science,Sampling (statistics),Computer science,Self supervised learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要