You can read a deep dive from Yann LeCun itself
, but the TL;DR is this. Self-supervised learning (SSL) is a technique for leveraging vast amounts of unlabeled data by using the data’s own structure as supervised labels. Let’s unpack that.
The most famous self-supervised models are probably modern Transformers: BERT, GPT, and company. They are trained on lots of lots of text, but, here is the key part: we don’t need labels. Instead, we take a sentence, hide some words, and train the model to predict the unknown words from the known ones. Pretty clever, right?
By forcing the model to predict the missing bits, we are implicitly making it learn the underlying structure of the data. With text, it is pretty straightforward (well, after they tell you about it…) but with images is a whole different story.
This is where Facebook’s new SEER model enters the picture. It is a major breakthrough, akin to what BERT and similar models meant for NLP a few years ago. And the best part, they open-sourced it