This was completely different from what we had seen previously. And it makes future work difficult because it’s hard to disentangle what is blocked and what is obscure.
Angwin: What did you learn about the nature of the blocklist while researching it?
Yin: We noticed that there’s a hierarchy, where certain terms get blocked but a slight variant won’t be blocked. One of the examples that we cite is “White nationalist” still is blocked, but not “White nationalists” or “White nationalism.”
We also noticed that certain terms are blocked by association. So any phrase mentioning “Muslim” is blocked. So innocuous things like “Muslim fashion” or “Muslim parenting” are blocked, even though they’re commercially viable. The same was not true for other religion pairs like “Christian fashion” or “Jewish fashion.”
We also noticed the strictest kind of block was a substring block. For example, “sex” will be blocked automatically, even if it’s part of a longer word. For instance, both “sex work” and “sexwork” were blocked.
This makes us think that Google is probably using regex
, or regular expressions, which are used for text matching. But it’s a very poor way to do matching because you have to think of the expressions in advance. So what we found is that they have a very primitive technology that’s very manually updated, and there’s probably a more sophisticated way of doing it.
And then there’s another question of, well, why do it at all? This is the most downstream form of content moderation, in my opinion, because it’s assuming that the content already exists, and it’s just adding a blindfold for advertisers.
It’s like instead of cleaning your yard, you’re just telling your guests to walk through the back entrance. It’s not like you’re actually cleaning your yard. It’s still dirty and hazardous, but your guests are now going through the back.
Angwin: You’ve studied YouTube and hate on platforms for a long time. What was most surprising to you?
Yin: There were two things that were surprising. The first was that there are a ton of re-uploaded deleted videos on YouTube. A lot of personalities that are de-platformed for hate speech pop up again when fans upload their videos.
It’s interesting because I know for a fact that YouTube uses fingerprinting technology for copyright violations, so they can automatically take down content for copyright owners, but they don’t apply the same technology to hate-based content.
The second surprising thing was their reaction to the social justice keywords that we found they were blocking, such as Black Lives Matter. Instead of removing words from their blocklist, they instead doubled down and blocked even more terms, like “Black excellence,” and continued to block anything paired with “Muslim.”
Last year, YouTube CEO Susan Wojicki said
, “We believe Black lives matter and we all need to do more to dismantle systemic racism.” But it seems they are not really dismantling anything.