View profile

Mostly Harmless AI

Mostly Harmless AI
By Alejandro Piad Morffis • Issue #5 • View online
🖖 Welcome to another issue of the Mostly Harmless AI newsletter. In this issue, I want to try something slightly different. Instead of a mashup of different things, I’ll try to make most of the content loosely fit an overarching theme. Let’s see how that goes.
Today’s theme is adversarial attacks.

🗞 What's new
This week a new paper from OpenAI came out, showing hilarious adversarial attacks on CLIP, their latest vision model that achieves SOTA results on a lot of benchmarks.
What are these adversarial attacks, and why are they important?
Adversarial attacks in neural networks are modifications to an input (for example, an image), that force the neural net to give an utterly ridiculous response, despite any human can still give the correct answer. For example, if we are very careful, we can craft a barely noticeable noise pattern that when added to a panda, will make a specific neural network classify it as a baboon.
These attacks are extremely important because they highlight biases in the learning and representation mechanisms of these neural networks that we were previously unaware of. They help us understand the limitations of these models, and we can often use them to improve generalization.
Now, adversarial attacks are not new, and most of them involve having access to the neural network weights to find the exact modification that makes it tick. However, these attacks are not that interesting, because in real life we will probably not have access to those weights. Attacks in the wild are those that we can make to the input without having to backpropagate through the network, and are much more interesting, and dangerous.
However, the new attacks on CLIP are at another level, because they hinge on an amazing feature of the model: it contains multimodal neurons. Multimodal neurons in humans are neurons that respond to high-level concepts, such as the infamous “Halle Berry” neuron, that fires when you’re shown either a picture of Halle Berry or her name written as text. So it’s a neuron that has learned a concept, whether it is encoded in text or as an image. CLIP has these neurons as well, which hints that this kind of neuron is probably a necessary (or at least a very common) feature of high-performing vision systems.
The existence of these neurons lets us, for example, trick the network into classifying an apple as an iPod, simply by adding a sticker note with the word “iPod” on top of the apple. It is hilarious, yes, but it is also kind of awesome, because this is one of the most advanced vision systems we have, and its own power is the source of this brittleness.
We are also very powerful pattern-matching machines. Who knows which bizarre human behaviours emerge precisely from our own computational power. Maybe this kind of trade-off is a necessary (or at least a very common) feature of any sufficiently advanced intelligence.
OpenAI
We've found that our latest vision model, CLIP, contains neurons that connect images, drawings and text about related concepts. https://t.co/zV6N4yvYcG
📚 For learners
To stay on-topic, here’s a Coursera lesson on adversarial attacks, part of a course on AI for everyone, which also includes other forms of abuse and misuses of AI.
AI For Everyone | Coursera
🔨 Tools of the trade
If you’re interested in adversarial examples and testing your models against a large range of attacks, this Python library is your best friend. It contains a lot of common attacks and it’s highly performant. It works with models in Pytorch, Tensorflow, and JAX.
🍿 Recommendations
Since we’re talking of adversarial attacks, let me recommend Zero History you a novel from William Gibson, author of Neuromancer (probably the best-known novel in the cyberpunk genre). If you haven’t read Gibson, he’s one of the big names in the Sci-Fi genre, so there you go, lots of reading to catch up to.
William Gibson
👥 Community
Today I want you to follow Christian Szegedy (@ChrSzegedy), who’s a top AI researcher and, incidentally, one of the fathers of the modern adversarial attack theory for deep learning. He continues to work at the frontier of deep learning, so your timeline will be filled with amazing new research.
🎤 Word of mouth
Now we’ll take a detour from this week’s topic, to talk about more personal stuff. This week’s AMA was an interesting mix of very abstract and very concrete questions. We discussed topics ranging from early math education, to what PhDs mean, to how embeddings work, to an abstract discussion of data augmentation.
As usual, I’ll leave here the root tweet where you can check all the questions and my humble answers.
Alejandro Piad Morffis
Hey folks 🖖!

🎙️It's Saturday again. Let's have another AMA? Ask me anything about machine learning, AI, computer science, or anything else...

Not that I can answer everything (of even most of it) but at least I'll try to point you to someone who can.

Let's do it! 👇
☕ Homebrew
Before signing off, I want to leave you (finally!) the first episode of my still very insecure podcast. I have no idea of how often I’ll do these, or even exactly what kind of topics and dynamics I’ll use. It’s just an experiment for now, a very fun one, so I hope you enjoy this first attempt at podcasting.
An origin story for Artificial Intelligence by Mostly Harmless AI
👋 That’s it for now. Please let me know what do you think of this issue, what would you like to see more or less of, and any feedback you want to share. If you liked this newsletter, consider subscribing (in case you’re not) and forwarding it to those you love. It’s 💯 free!
Did you enjoy this issue?
Alejandro Piad Morffis

A weekly newsletter on all things AI, including recent news, hot resources, and interesting conversations happening all around the Internet.

If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue