We have convincing evidence that sentence order prediction is a more consistently-useful learning task that leads to better language representations, we hypothesize that there could be more dimensions not yet captured by the current self-supervised training losses that could crea...
Similar approaches are common in NLP, we demonstrate that this approach can be a surprisingly strong baseline for semi-supervised learning in computer vision, outperforming the state-of-the-art by a large margin
We presented wav2vec 2.0, a framework for self-supervised learning of speech representations which masks latent representations of the raw waveform and solves a contrastive task over quantized speech representations
The proposed problemagnostic speech encoder+ architecture is based on an online speech distortion module, a convolutional encoder coupled with a quasi-recurrent neural network layer, and a set of workers solving self-supervised problems
Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems by using less than 1% of the training data
We evaluated each key design choice of augmented multi-scale deep information maximization and contrastive predictive coding, which are two representative Contrastive self-supervised learning algorithms, and used our framework to construct a new approach to which we refer as Yet ...
We believe that our work can be a meaningful step in realistic Visual Question Answering and solving the language bias issue, and this self-supervision can be generalized to other tasks that are subject to the inherent data biases