This week
a new paper from OpenAI came out, showing hilarious adversarial attacks on
CLIP, their latest vision model that achieves SOTA results on a lot of benchmarks.
What are these adversarial attacks, and why are they important?
Adversarial attacks in neural networks are modifications to an input (for example, an image), that force the neural net to give an utterly ridiculous response, despite any human can still give the correct answer. For example, if we are very careful, we can craft a barely noticeable noise pattern that when added to a panda, will make a specific neural network classify it as a baboon.
These attacks are extremely important because they highlight biases in the learning and representation mechanisms of these neural networks that we were previously unaware of. They help us understand the limitations of these models, and we can often use them to improve generalization.
Now, adversarial attacks are not new, and most of them involve having access to the neural network weights to find the exact modification that makes it tick. However, these attacks are not that interesting, because in real life we will probably not have access to those weights. Attacks in the wild are those that we can make to the input without having to backpropagate through the network, and are much more interesting, and dangerous.
However, the new attacks on CLIP are at another level, because they hinge on an amazing feature of the model:
it contains multimodal neurons. Multimodal neurons in humans are neurons that respond to high-level concepts, such as the infamous āHalle Berryā neuron, that fires when youāre shown either a picture of Halle Berry or her name written as text. So itās a neuron that has learned a concept, whether it is encoded in text or as an image. CLIP has these neurons as well, which hints that this kind of neuron is probably a necessary (or at least a very common) feature of high-performing vision systems.
The existence of these neurons lets us, for example, trick the network into classifying an apple as an iPod, simply by adding a sticker note with the word āiPodā on top of the apple. It is hilarious, yes, but it is also kind of awesome, because this is one of the most advanced vision systems we have, and its own power is the source of this brittleness.
We are also very powerful pattern-matching machines. Who knows which bizarre human behaviours emerge precisely from our own computational power. Maybe this kind of trade-off is a necessary (or at least a very common) feature of any sufficiently advanced intelligence.