Auditing algorithms is hard work, and as The Markup reporter Alfred Ng wrote this week, there are no agreed-upon standards
for how auditors should independently assess algorithmic performance. It’s even harder for journalists, who are not being invited in to do an audit but are attempting to understand algorithms from the outside.
So I thought it would be interesting to review the different approaches we have taken to auditing algorithms so far.
The Secret Document
. The Allstate investigation was powered by a document that investigative data journalist Maddy Varner obtained through a public records request to Maryland insurance regulators. The document we obtained listed the prices that Allstate was proposing to charge to every one of its customers in Maryland. This allowed us to do some fancy math
to analyze their algorithm and show that it was a suckers list
that aimed to squeeze big spenders.
The Experiment. To analyze how Gmail’s algorithm sorts email into different inboxes, we designed an experiment. Investigative reporter Adrianne Jeffries set up a new Gmail account and subscribed to hundreds of political email newsletters to see how Gmail would sort them when they arrived. She and investigative data journalist Leon Yin found that there was wide and seemingly arbitrary variation between the political newsletters that were sorted into the primary inbox and those sorted into the less-visible “promotions” inbox.
Scrape and Classify
. Investigative data journalist Leon Yin measured how often Google boosted its own products to the top of the search results page by scraping more than 15,000 search results and then building classifiers to help us sort out which results were Google products and which were not. (We accounted for different interpretations of each category in the full methodology
.) The result? We were able to find that Google gives itself 41 percent of the first page and 63 percent on the first screen of a mobile device.
. Investigative data journalist Surya Mattu took the “scrape and classify” approach to another level when he built Blacklight
, a privacy forensics tool that lets readers analyze how third parties are exploiting user data on any website. When a user enters a URL into Blacklight, it instantly visits the page and runs several tests to see if the site is using any invasive techniques, such as recording keystrokes and mouse movements or “fingerprinting” the visitors’ computer.
The Honey Pot
. Investigative reporter Aaron Sankin has covered internet hate groups for a long time, and the result of his searching for a lot of unsavory content is that he is also targeted with unsavory content. This is what hackers call turning yourself into a “honey pot
”—meaning you make yourself look like an attractive target for whatever it is you are searching for. This is likely why Facebook saw him as someone interested in “pseudoscience
” and targeted him with ads for a radiation-blocking hat, and how he discovered that Facebook had created a linked network of Holocaust denial pages and groups
. Both examples allowed us to show how the platform’s algorithm wasn’t doing what it promised.
Court Records. When we heard that landlords were using software to assign risk scores to potential renters, we struggled to figure out how to tackle the story. Investigative reporter Lauren Kirchner sought and obtained public records for what kinds of software public housing authorities were using, but we couldn’t get data about how many people were incorrectly denied housing by mistaken risk scores, and the private housing market was still a mystery.
So we ended up using the old-school technique of scouring court records to find out how many people had filed claims against the software makers for harmful errors. It took months, but we ended up with a striking database
of cases that showed a clear pattern of sloppy and life-altering mistakes by the algorithms
. To understand how Facebook’s algorithms target information to different groups of people, we set up a national panel of users who let us monitor their Facebook feeds. The Citizen Browser project is probably our most ambitious—and expensive!—attempt to audit an algorithm, but it has already yielded insights that we could not have gotten any other way, including the fact that Facebook never labeled any of Trump’s posts “false”
and that Facebook broke its promise to Congress
to turn off political group recommendations.
As you can see from this list, each story demands its own technique. There is no one-size-fits-all solution to auditing the algorithms that determine the architecture of our lives.
And truthfully it’s kind of insane that we have to do this amount of engineering and data analysis. If we were reporters covering another industry—say airlines—we would likely spend much more of our time on traditional reporting techniques, such as filing public record requests to the Federal Aviation Administration, which is responsible for ensuring safety of airplanes. We likely wouldn’t be trying to inspect the planes ourselves (although perhaps the 737 Max disaster makes the case that someone should be).
But there is no regulatory body that oversees Big Tech, and so we do these insanely heavy lifts because we believe the public needs to know how we are being sorted and judged by automated systems.
In the year since we published our first story, Big Tech has only gotten much bigger, which means The Markup is called on to shoulder even more weight. It’s in that spirit that we’re launching our first-ever annual giving campaign. We couldn’t do what we do without reader support. If you would entrust us with a tax-deductible contribution
in any amount, it would be our honor to use it to hold Big Tech to account.