View profile

Scraping Isn't Hacking

Hello, friends,  In 1895, journalist Ida B. Wells published “The Red Record: Tabulated Statistics and
Dispatches from Editor-in-Chief Julia Angwin
This Week
Hello, friends, 
In 1895, journalist Ida B. Wells published “The Red Record: Tabulated Statistics and Alleged Causes of Lynching in the United States, 1892–1894,” which cataloged all the known records of “colored victims of lynching law” during those years. 
Not only did she list all of the victims’ names, but also their alleged crimes, and where the lynching occurred. She let the data inform readers of the scale of lawlessness that might have otherwise been hidden in various local newspaper reports around the nation.
“During the year 1894, there were 132 persons executed in the United States by due form of law, while in the same year, 197 persons were put to death by mobs who gave the victims no opportunity to make a lawful defense,” she wrote.
Frederick Douglass, the famed abolitionist and social reformer, thanked her for her powerful accounting. “There has been no word equal to it in convincing power,” he wrote. “I have spoken, but my word is feeble in comparison. You give us what you know and testify from actual knowledge. You have dealt with the facts with cool, painstaking fidelity, and left those naked and uncontradicted facts to speak for themselves.”
Compiling facts and letting them speak for themselves—with painstaking fidelity—is also what we aspire to do at The Markup. Ida B. Wells’s work is an inspiration, and we are lucky to have more tools at our disposal for data collection than she had. 
Not only can we chase slippery facts by interviewing experts and using our legal right to access documents, but we also can apply computer automation to collect data at scale. 
One way we use automation is web scraping, where we send a computer bot to visit tons of webpages and automatically collect the data on those pages. Here’s how it works:
At The Markup, we invest heavily in technical data collection skills because we believe they are important tools in journalism’s arsenal. That’s why our newsroom includes as many experts in automated data collection as it does experts in human and legal sourcing.
In our recent investigation of Amazon’s capability to prevent the sale of banned items on its website, investigative data journalist Jon Keegan and reporter Annie Gilbertson accessed Amazon and ran searches for illegal products. Jon used computer automation tools to systematically capture and catalog product listings.
Jon also set up an Amazon seller’s account—under his real name—and was able to list two items for sale that Amazon.com’s rules prohibit (though both were legal to sell under New York State and federal law): an armorer’s wrench for use on an AR-15 and a 10-round AR-15 magazine.
These techniques allowed Jon and Annie to monitor a fast-changing dynamic marketplace and make newsworthy observations about Amazon’s ability to prevent the sale of banned items. 
But their techniques are analyzed within the framework of an outdated law designed to criminalize hacking. The Computer Fraud and Abuse Act, enacted in 1986, makes it a crime to intentionally access “a computer without authorization.” Because visiting a website is considered accessing a computer under the statute, “without authorization” has been interpreted by some courts as allowing sites to dictate authorized access through their terms of service. 
That means that if Amazon decides to include in its terms of service a ban on automated data collection or setting up test accounts, it could try to bring a legal claim against us for our reporting under one (flawed) theory of the Computer Fraud and Abuse Act. As a newsroom invested in automated data collection techniques, the risk is so high for us that when we announced the founding of The Markup in 2018, Fortune published an article declaring we faced “legal peril.”  
For better or worse, the Supreme Court has agreed to consider a case, Van Buren v. United States, examining how courts interpret the Computer Fraud and Abuse Act. There is a lot at stake. 
That’s why The Markup filed an amicus brief in the Supreme Court this week arguing that an overly broad reading of the Computer Fraud and Abuse Act could endanger our ability to gather news, and in doing so, fly in the face of First Amendment protections for newsgathering. In our brief, The Markup president Nabiha Syed, along with our attorneys at Davis Wright Tremaine, argue that the court should rely on its history of rulings that say the government should not criminalize routine news reporting techniques.
“Data journalism serves an important role in helping Americans observe and understand the forces at play in that public square that might otherwise remain hidden,” they write. “A statute that allows powerful forces like the government or wealthy corporate actors to unilaterally criminalize newsgathering activities by blocking these efforts through the terms of service for their websites would violate the First Amendment.”
We join with many other journalist and nonprofit groups who have filed amicus briefs in the case, including The Reporters Committee for Freedom of the Press, The Knight First Amendment Institute at Columbia University, and the Electronic Frontier Foundation
We hope that the pursuit of facts wins the day.
As always, thanks for reading.

Best,
Julia Angwin
Editor-in-Chief
The Markup


 
From The Markup
 
Quiz: Can You Spot the Banned Amazon Item?
This email doesn't track you when you open it or click on any links. To learn more read our Privacy Policy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
The Markup - The Markup P.O. Box 1103 N.Y., N.Y. 10159