Angwin: Tell me about your history and how you ended up becoming an expert in data leakage across the internet?
Acar: When I was a Ph.D. student in electrical engineering, a mentor pointed out this project from EFF called
Panopticlick that showed that you can track users by their unique browser characteristics—a practice called browser fingerprinting. I found that spooky and interesting at the same time. My first
study focused on developing tools to detect and measure browser fingerprinting at scale. And then I continued this line of research on
unconventional tracking methods.
Angwin: In 2015, you published a
report that provided a very comprehensive look at how Facebook was tracking users across the web. Can you talk about your findings?
Acar: This study was part of a larger investigation into Facebook’s privacy practices by the Belgian Data Protection Authority (DPA), who requested the report. My colleagues, who were law faculty at the University of Leuven, and I were tasked with investigating how Facebook tracks users across the web through social plug-ins, and we focused specifically on nonusers—people without a Facebook account.
We examined how Facebook cookies and plug-ins such as the Like button can be used to track users across the web, even if they don’t have a Facebook account. One of the surprising findings was that on the European Digital Advertising Alliance website, where users could supposedly opt out of targeted advertising, Facebook would actually place a uniquely identifying cookie that can be used to track you around the web. Facebook claimed this was a bug, and it did end up fixing the issue. Regardless, the Belgian DPA’s investigation, which our report was part of, led to a years-long
litigation between the DPA and Facebook.
Angwin: Let’s talk about the Facebook pixel. Can you tell us what it is?
Acar: The Facebook pixel is a piece of code you embed on your website in order to (re)target website visitors with ads on Facebook. It can also be used to measure conversions; for example, when someone sees your advertisement on Facebook and visits your website to sign up for a subscription or to buy a product, the pixel records this information.
If a website uses the Facebook pixel, then visits to the website get linked to the visitor’s Facebook account. With the pixel, the website can say to Facebook, “Hey, this user signed up for something” or “This user just added a product to their cart.” You can combine all these activities and apply them as labels to a visitor, and then you can target visitors (“audiences”) with ads on Facebook based on these labels.
Angwin: In our recent story, we found that websites using the pixel don’t necessarily know what data they’re sending back to Facebook. Is this normal, and can you put this finding in context for us?
Acar: The Facebook pixel is present on almost 30 percent of the top 100,000 websites, so it is quite common. It’s possible that some of these websites are not fully aware of what data the pixel collects about its visitors. Unlike other trackers, the pixel doesn’t just record that you visited a website, it collects more granular information about your interaction with the site. For example, the pixel can capture a user who searches for a product, types something into a form, adds something to their cart, and much more.
Given that the pixel doesn’t have a user interface—such as the Like button—I don’t think users expect that Facebook is actually collecting this granular data about their online activities, and when this information is merged with a Facebook account, it becomes a more complete profile, allowing Facebook to make more intrusive inferences about you. This is much worse from a privacy perspective, since data from the pixel can be tied to your real name, offline activities, and your social graph.
Acar: Automatic advanced matching is a feature of the Facebook pixel that more accurately matches online visitors and their activities to Facebook users. When this feature is enabled, the pixel extracts and hashes personal data that’s entered into forms, such as an email address, phone number, name, date of birth, etc. Facebook then uses those (hashed) identifiers to link your Facebook profile to your website visits and activities.
The advantage of using this over cookies is that as a user you can remove or block cookies. Many browsers like Safari and Firefox now automatically block tracking-related cookies, and there are ongoing efforts to
phase out third-party cookies. This means that identifiers based on email address or phone number—that are global, unique, and persistent—will likely become more important.
Facebook claims to collect this personal information from the website forms when the user clicks the submit button, but we found that it instead collects it when you click virtually any button or link on the page. For example, even if you just type in your email address and maybe you change your mind and decide to go back to another page or read the privacy policy before opening an account, once you click any link or any button, Facebook will extract your personal information, including email address, from the form, hash it, and send it to its servers.
We also found that TikTok was using a similar method to collect personal data typed into forms. TikTok has a product called TikTok Pixel, which also has a feature to automatically harvest form data. When you type in your email address or phone number on a form, clicking almost any button triggers data collection by TikTok. [TikTok and Meta did not respond to Wired’s request for comment when it
reported these findings.]
Angwin: Facebook and TikTok are not the only services collecting information before you press submit. You
also discovered that other websites extract email and password information from users. Can you describe what you found?
Acar: We investigated what personal data is exfiltrated on websites before a user submits a form or consents to sharing information. For example, when you are on a website that has third-party trackers, and say you fill out a log-in form, we want to know whether this personal information is exfiltrated to trackers.
We found that on almost 3,000 out of the top 100,000 websites, a user’s email address or its hash was sent to tracker domains before any information was submitted by the user. In addition, we found that on more than 50 websites, when users typed in their password, the password was incidentally collected by third-party trackers. One other surprising finding is that when we looked into different website categories, the one where we observed the most email leaks was fashion and beauty, followed by online shopping, followed by news websites. The category with no leak at all was pornography!
Angwin: What is being done to prevent this invasive type of tracking?
Acar: We are seeing a shift, and I think this could be a reason to be hopeful. Over the last three to four years, major browsers such as Firefox and Safari have introduced new privacy preserving features that make tracking more difficult, especially across sites. We are seeing more efforts to make tracking harder. We are also seeing privacy products like DuckDuckGo [which is a
donor to The Markup] and Brave, where the distinguishing feature is privacy.
However, one of the reasons we wanted to study how email addresses and other identifiers can be used for tracking is because they don’t require cookies and so they could actually bypass all these efforts to make the web more private. What worries me is that more ways around these protections will be developed to track, profile, and target people.