View profile

Web Scraping Is Not a Crime

Dispatches from Editor-in-Chief Julia Angwin
This Week
Hello, friends,
This week the Supreme Court heard oral arguments in a case that is very close to our hearts here at The Markup. 
The case—Van Buren v. United States—is the first time that the court has substantially examined the Computer Fraud and Abuse Act, a controversial anti-hacking law enacted in 1986 that prohibits unauthorized access to a computer network.
The CFAA has been interpreted by some courts to mean that anyone who violates the terms of service of a website could be prosecuted. That could criminalize work that journalists do every day called web scraping.
Web scraping is the technique of using automated tools to collect data from public websites. As we chronicled in an article this week, web scraping is crucial to holding corporations and governments accountable—from our investigation of how Google preferences itself in search results to USA Today’s investigation Copy, Paste, Legislate, which found a pattern of cookie-cutter laws, pushed by special interest groups, circulating in legislatures around the country.
Because scraping is so important to us, The Markup filed an amicus brief in the Van Buren case arguing that any ruling that criminalizes newsgathering would violate the First Amendment. We also distributed T-shirts, emblazoned with “Scraping Is Not a Crime,” that scraping defenders took pictures of themselves wearing on the day of the arguments.
To understand more about the nuances of the law and why The Markup is taking up this fight, I interviewed my partner in running this place, Nabiha Syed—president of The Markup, renowned First Amendment attorney, and author of our amicus brief.
The interview is below, edited for brevity.
Angwin and Syed sport their "Scraping Is Not a Crime" T-shirts.
Angwin and Syed sport their "Scraping Is Not a Crime" T-shirts.
Angwin: Let’s start at the beginning. What got you interested in pursuing media law?
Syed: Media shapes reality. That can be positive, or as I first experienced it, deeply negative. On Sept. 10, 2001, I was a first-generation Pakistani-American teenager wondering if my parents would let me go to a dance. (Spoiler: No.) Two days later, and for months to come, all I saw in the media was suspicion that I, my family, and my community might be terrorists. Overnight, my background [as a Muslim] was criminalized. 
But it felt different on the internet, where you could see opinions that were not reflected in the mainstream. I didn’t see myself in mainstream media. I saw myself on the internet, in message boards, and in Listservs. Of course I wanted to protect that precious speech. That is what drew me to First Amendment and media law. 
Media law also gives us ways to argue for the public’s right to know. Surveillance wasn’t a thought experiment for me—for my community, it was lived experience—and I thought the public deserved to know more. So I co-founded a media law clinic at Yale Law, the first in the country, in part to help pursue that transparency.
Slowly, it dawned on me that the most powerful actor in the room wasn’t only the federal government. It was the private surveillance state. In a stroke of luck, I was at the law firm advising The Guardian on the Snowden documents. And I realized the internet—and the hidden surveillance it brought us—wasn’t the utopian place that I grew up in. 
Angwin: When did you first encounter web scraping? 
Syed: It was such a commonplace technique in newsrooms that, at first blush, I thought nothing of it. Of course you can go to a website and extract information: If you can read it, why can’t you read it with a bot? 
And then the Weev issue happened: [A hacker] automated a script to collect email addresses accidentally published by AT&T and then shared that with Gawker. The Department of Justice went after him for that, using the CFAA. I took a look at the text of the law and was horrified. I thought, Jail for scraping publicly available information, when companies and reporters do that routinely? That’s dangerous. 
Angwin: Was the challenge of defending web scraping one of the reasons you were attracted to The Markup? (Other than my just begging you to join for two years!)
Syed: The journalism you have pioneered is so aggressive in its quest for facts at scale, not mere anecdotes. And that kind of approach challenges power because you can’t dismiss it as easily. If scraping is a tool in your arsenal, then count me in for defending it. How could I say no to that—or you!
I get out of bed in the morning because I like challenging power—the bigger, the better. Robust, data-driven journalism does that. And  it is clear to me as a lawyer that the scope of this statute was never meant to chill journalism. We have to fix that. 
Angwin: Can you summarize the arguments defending web scraping that you make in our amicus brief (which was written by you and our outside attorneys at Davis Wright Tremaine).
Syed: It’s straightforward: In the United States, the law allows us to publish truthful, publicly available information that is in the public interest—even where there are laws requiring the information to stay a secret. The Supreme Court has consistently protected that right to publish. And a necessary part of publishing accurate information is the right to gather that information in the first place. 
Reading the CFAA too broadly puts newsgathering at risk and gives private websites the power to weaponize their terms of service. No one intended this outcome—but that is the problem with an unconstitutionally vague law. For those who want to nerd out on the details, please do read the brief and give me your feedback! 
Angwin: Defenders of the CFAA say that allowing scraping would allow bad actors like Cambridge Analytica to violate user privacy. What do you say to that argument?
Syed: How convenient that privacy exists as a sword and a shield for these companies! No one worried about privacy when these companies were swashbuckling around, grabbing data and building free-but-costly systems. And now privacy is a shield to scrutiny, when researchers and reporters want to use data to interrogate these systems? No. 
I share privacy concerns—I mean, The Markup makes a privacy promise to our readers, so it’s at the core of our business. But the correct answer for our privacy concern is a federal privacy law—not retrofitting an outdated anti-hacker law inspired by the “War Games” movie.
Angwin: What was your takeaway from the Supreme Court oral arguments in the case. Commentators seemed to think the justices were skeptical of the broadness of the statute.
Syed: Even the Department of Justice wanted to walk away from their previous, overbroad interpretations of the CFAA. With good reason, because they were legal stink bombs. But those interpretations could be that broad because the statute is that broad. 
No matter how much the DOJ wants to, it can’t conjure limiting principles that aren’t in the statute. On Monday, the DOJ tried to inject a creative limitation, arguing that the CFAA is about “specific, individualized permission.” That is not in the statute. And I don’t think we should just trust the government on the scope of a criminal law, given how aggressively it has used it before.  
It was promising to hear the justices puzzling through whether there were limiting principles for the CFAA. That lack of limiting principle is what terrifies journalists and researchers. So we’ll see what the justices do about it. 
Angwin: Everyone is asking how they can get a hold of our anti-scraping T-shirts. Can you give them some hope of being able to buy some for the holidays?
Syed: Stay tuned! If you’re interested in a shirt, please do email us at prototype@themarkup.org, and we will be in touch very soon.
If you support The Markup’s aggressive newsgathering, and our efforts to change the law to protect us all, we hope you will consider giving before the end of the year at themarkup.org/donate.
As always, thanks for reading. 
Best,
Julia Angwin
Editor-in-Chief
The Markup
 
From The Markup
 
Why Web Scraping Is Vital to Democracy
Introducing Simple Search
Tech Companies Won Big in California, but the Gig Worker Battle Isn’t Over
Blacklight
P.S. Never miss a Markup article or investigation. We’ll email you every time we publish about the ways powerful actors are using technology to change society, usually two to three times a week. Sign up instantly here.
This email doesn't track you when you open it or click on any links. To learn more read our Privacy Policy.
In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
The Markup - The Markup P.O. Box 1103 N.Y., N.Y. 10159