I first met you in 2017 when I went to Harvard to talk about my investigation where we found that a criminal risk score algorithm was biased against black defendants
. You came in early and showed me some of your work on algorithmic bias, which I was blown away by. You’ve told the story many times, but can you briefly summarize how you got interested in algorithmic bias?
Buolamwini: I was at MIT getting my master’s degree trying to make my parents proud, and I took a course my first semester called Science Fabrication where you read science fiction and try to create something you wouldn’t otherwise make. I created this project called the Aspire mirror, where when you would look into the mirror, using half-silvered glass, it would have the effect of putting basically a digital filter on the actual mirror itself. So I got it working, and I thought wouldn’t it be cool if it actually tracked my face in the mirror. To do this, I set up some face-tracking technology, but it wasn’t working on my face consistently. That is until I put on this white mask and it immediately detected a face.
So it was that experience that got me thinking, is this just my face? Or is there something else going on? I ended up doing a TED talk
about this, where I show a demo of the technology not detecting my face until I put on a white mask, and detecting the face of my lighter-skinned friend. And then I thought, You know what, folks might want to check my claim. So let me check myself.
So I took my image on my TED profile and ran that through commercial AI systems from a bunch of different companies, and many of them didn’t detect my face, and the ones that did labeled me male, so I was misgendered.
This is what led to my MIT master’s thesis work that came to be known as Gender Shades
, where I ran the faces of Parliament members from six countries, three from Africa, three from Europe, through AI systems from a variety of companies, including IBM and Microsoft. I found that it wasn’t just my face. Overall, the programs tended to work better on classifying lighter-skin faces than darker-skin faces. They also tended to work better on classifying the face of people labeled male as opposed to people labeled female. This became my MIT master’s thesis in 2017. Then I co-authored a peer reviewed paper
with Dr. Timnit Gebru in 2018, which is based on my MIT master’s thesis.
Angwin: After you discovered bias in these commercial AI tools focused on analyzing faces, how many of them actually improved their software?
I think one thing to keep in mind is what does it mean to improve the software? Researcher Deb Raji, who was an intern for the Algorithmic Justice League at the time, and I decided to do a follow-up study
on the Gender Shades findings, and this time we also included Amazon.
So if we’re just talking about technical accuracy, all of the companies (with the exception of Amazon, which was not originally tested) reduced their gaps, but they still worked the worst for dark skinned women. Amazon had the same gap as its worst competitors the year before. So what does this tell us? That we have to continuously check. We can’t just assume because one company has performed better in a specific area, it translates to other areas as well.
There have been follow-up studies where people look at other categories like Indian American or South Asian faces. So there’s been so much more beyond what I initially did, which was a very simple test.
The National Institute of Standards and Technology did a much more comprehensive study in 2019
. They found the technology exhibited what they call demographic effects but what I call bias on race, gender, and also age, which is really important when we’re thinking about the tools we want to use for unemployment benefits.
And so it’s not to say some don’t perform well in a very constricted way. It’s to say, do we have a full understanding of how it performs in real world conditions on actual people and their lived experiences?
Buolamwini: What we’ve seen is that resistance works. San Francisco was the first city to ban facial recognition technology (we call it Ban Francisco), and we’ve seen a lot of great work in Massachusetts as well. There is another great example in the film “Coded Bias,” where you see Brooklyn tenants successfully resisting the installation of a facial recognition system in their apartment complex.
We are seeing wins in different ways throughout the world, but we are also seeing pushback. In some ways, the pandemic has been a way of marketing and supporting biometric technology. Companies say, “Look, it’s convenient, it’s remote, you don’t have to touch people.” They want to sell you on efficiency.
Do we want the face to be the final frontier of privacy? I don’t believe we have to submit to the harvesting of our faces or the plunder of our data in order to access basic services. We need to have a voice, we need to have a choice.
Angwin: Would you support a total ban on surveillance tools like facial recognition?
Buolamwini: I always ask people, what do you mean by facial recognition? The reason being, for any set of technologies, if you define it by the technology and not the application space, there can be technical evasion. So I think it’s really important that we focus on how systems are being used.
I support a ban on face surveillance, and that’s mass surveillance using facial recognition of any kind. When you start saying, “We’re going to ban a specific technology,” the definition of that technology can be evolving, and people can evade it in different ways.
I also think it’s really important that there’s public debate about which systems we think will have harms and which ones will have benefits. I think we really need the precautionary principle here, which is to say, it’s understandable that people want efficiency, it’s understandable that if companies are marketing a technology and saying it’s going to solve your problems that government officials are going to listen. But it’s also up to them to have higher scrutiny. Have the claims been checked? Have there been third-party verifications of what they’re claiming to do? Have you truly considered the harms and the consequences for your constituents in the first place?
It is really important for people to think of their face like their fingerprint, their face print. It’s unique and it’s something that should be protected.
Some of your work is focused on algorithmic audits, something we also do here at The Markup. As part of your work you’ve actually written about the limitations of audits
. Can you talk about how you think the auditing space should evolve?
Buolamwini: We need to focus on definitions and terminology: What do you mean by audit? Is an audit just checking for statistical parity and not even considering the harms of the people who are involved?
The other thing to consider is positionality. Is it a first-, second-, or independent-third-party audit? A first-party audit is, “Hey, we checked ourselves, we’re great.” A second party audit is, you hired somebody, they want to say one thing, but you don’t write the check until they say another. And then you have the third-party audits, like what you do at The Markup.
I think that for government services that are going to have such an impact, like using facial recognition for the IRS, you need third-party verification.
It also has to be ongoing and continuous. I think about it like algorithmic hygiene; you don’t shower once and say that you’re clean for the rest of the year. Similarly, we don’t want a situation where you feel like you’ve checked the system once in a particular context and then you’re done.
Angwin: Your Ph.D. thesis focuses on the evocative audit. Can you give us a description of it?
Yes, so my thesis is called Facing the Coded Gaze with Evocative Audits and Algorithmic Audits
. The coded gaze is my term for describing how the people who hold power shape the technology that we see, and it can be through their preferences, through their prejudices, or through their priorities. So the coded gaze is like the male gaze, or the White gaze, or the post-colonial gaze.
The evocative audit is really about humanizing algorithmic harms. When you think about algorithmic audits, particularly the kinds that I’ve done, they tend to be focused on algorithmic bias. But the algorithmic audit only goes so far because we need to know why it matters. The evocative audit says, “All right, let’s look at this in context, let’s concretize the harm.”
An example of an evocative audit is my visual poem (which is also an algorithmic audit) called AI, Ain’t I A Woman?
, where I pose Sojourner Truth’s 19th-century question to 21st-century algorithms. And let me tell you, it did not go well. In the project, I demonstrate companies like IBM, Microsoft, and Amazon misgendering iconic women of color like Michelle Obama, Serena Williams, and Oprah Winfrey.
The whole point of doing this is to actually invite you to empathize. What does it mean to face algorithmic harm or to experience machine erasure or denigration in some way? That’s what the evocative audit does. It draws you in to show why these systems matter and how they can impact people in a negative way.
Angwin: In addition to being a scholar, you are a poet. Do you have a poem you’d like to leave our readers with?
Dr. Buolamwini: I think maybe the poem I wrote to the Brooklyn tenants [who resisted facial recognition] is so perfect for this moment: