You left Google after a controversy about a paper
you wrote about a type of AI called large language models. What are they, and what risks do they carry?
Gebru: Large language models (LLMs) are a type of language technology that, if given a set of words, tries to predict the next word based on the previous text. These models are trained using lots of data on the internet; they essentially crawl all of the internet. They’re used as intermediary steps for many different underlying tasks. For example, Google Search uses LLM to rank queries and do automated machine translation.
Currently, there is a race to create larger and larger language models for no reason. This means using more data and more computing power to see additional correlations between data. This discourse is truly a “mine is bigger than yours” kind of thing. These larger and larger models require more compute power, which means more energy. The population who is paying these energy costs—the cost of the climate catastrophe—and the population that is benefiting from these large language models, the intersection of these populations is nearly zero. That is one concern.
There’s also this underlying belief that the internet reflects society, and if we crawl the entire internet and compress it into a model, it will have diverse views. In a paper
I wrote with the linguist Emily Bender and others, we make the point that size doesn’t guarantee diversity. Because of who has access to and uses the internet, you’re going to encode dominant and hegemonic views. Additionally, even when society progresses and there are social movements like Black Lives Matter, large language models will lag behind because they haven’t been trained on this new data.
We also run into an issue because these models can produce such coherent text. There was an example in the paper where a Palestinian man woke up and posted “good morning,” Facebook translated it to “attack them,” and then the man was arrested. In this case, because the translation was so coherent, there were no cues that it was translated incorrectly. The creation of coherent text at scale is alarming, and it can embed racist and sexist language it has been trained on. You can imagine how this could intensify misinformation, disinformation, and radicalization.
Angwin: The paper you mentioned, which was about the limitations of large language models, is titled “On the Dangers of Stochastic Parrots.” Can you explain this title?
Gebru: Emily Bender came up with this name, but I can explain it. Basically, if you hear a parrot saying coherent sentences, you will probably be very impressed, but the parrot is really just repeating things. This is an analogy for what these models are trained to do, which is to search text for patterns and predict words based on the likelihood of the appearances of all the prior strings of words.
This is also why I take issue with people saying that these models are intelligent. If there are certain capabilities that these models possess, let’s describe them. However, this specific language is used to anthropomorphize this technology in a way that allows these companies to relinquish their responsibility to build safe systems.
There was an article
this past April in the New York Times Magazine on large language models that you criticized. Can you talk about what the press gets wrong here?
Researchers do a lot of work to uncover harms, and we’re extremely under-resourced. Then, here comes this article from a popular outlet, and the journalist talks to all these high-up executives at AI-startups, and the framing of the piece becomes “look at this magical thing.” This article treats large language models as if they are some magical intelligent being rather than an artifact created by humans. In my view, the press is supposed to be a check to power, but when the press is adding to the power that Silicon Valley already has, then we don’t have any checks and balances. I also want to add that my colleague Emily Bender spent her entire weekend writing a blog post
with a point by point response to this article that everyone should read.
Angwin: You have advocated for more oversight of AI models like LLMs and others. Can you talk about what you would like to see?
I wrote a paper called Datasheets for Datasets
that was based on my experience as an engineer designing systems. As a circuit designer, you design certain components into your system, and these components are really idealized tools that you learn about in school that are always supposed to work perfectly. Of course, that’s not how they work in real life.
To account for this, there are standards that say, “You can use this component for railroads, because of x, y, and z,” and “You cannot use this component for life support systems, because it has all these qualities we’ve tested.” Before you design something into your system, you look at what’s called a datasheet for the component to inform your decision. In the world of AI, there is no information on what testing or auditing you did. You build the model and you just send it out into the world. This paper proposed that datasheets be published alongside datasets. The sheets are intended to help people make an informed decision about whether that dataset would work for a specific use case. There was also a follow-up paper called Model Cards for Model Reporting
that I wrote with Meg Mitchell, my former co-lead at Google, which proposed that when you design a model, you need to specify the different tests you’ve conducted and the characteristics it has.
What I’ve realized is that when you’re in an institution, and you’re recommending that instead of hiring one person, you need five people to create the model card and the datasheet, and instead of putting out a product in a month, you should actually do it in three years, it’s not going to happen. I can write all the papers I want, but it’s just not going to happen. I’m constantly grappling with the incentive structure of this industry. We can write all the papers we want, but if we don’t change the incentives of the tech industry, nothing is going to change. That is why we need regulation.
Angwin: What kind of regulation do you think we need?
Gebru: Currently, there are high-level managers and VPs at tech companies who don’t encourage employees to look into issues around misinformation or bias because they know that if issues are discovered they will have to fix them and they could be punished for them, but if they don’t find the issues there is plausible deniability. Think about if a drug company said, “We aren’t going to test this drug thoroughly for side effects because if we don’t find them, we don’t have to fix them.” That would be insane.
I want a law that makes companies prove to us that they have done impact assessment tests. I want to slow them down and believe that the burden of proof should be on them. Additionally, I want more investment into alternative research so the communities who are harmed by these tech companies can envision an alternative future.
Angwin: Can you tell me a bit about your new research institute?
Gebru: When we look at the history of AI research funding, it is often funded to support military efforts—interest in building autonomous weapons—or it’s funded by large tech companies who want to make as much money as possible for their corporation. When we think about investment in the “AI for good” space, it’s always an attempt to retrofit an existing technology. Say you’ve created this autonomous military tank and then the question becomes how do we use the tank for good?
I wanted this to change. I created a small research institute last December called DAIR
, the Distributed AI Research Institute. It’s meant to be an independent—as independent as one can be—institute for AI research. If we’re going to do AI research, we need to be thoughtful about how we do it, what we work on, and how we continue to uncover harms without being persecuted. I want DAIR to be a place where researchers have the space to imagine alternative futures.