View profile

Proposal: Tax the Data

The Soren Review
Proposal: Tax the Data
By The Soren Review • Issue #2 • View online
Improving Competition and Protecting Consumers in the Information Ecosystem

Part of increasing competition, from a policy standpoint, is better understanding how a given market operates. One of the markets called out in the Biden EO on competition is “information platforms,” by which I believe the administration means businesses like Facebook and Amazon. The FTC just attempted to make an anti-trust case against Facebook as a monopoly in social media, only to have it thrown back at them because they hadn’t adequately defined the nature of the industry. That is a major setback, but it might also be cause for some innovation in the regulatory space. My proposal is this: let’s tax the data that companies like Facebook and Amazon collect on their users.
Why a Data Tax?
Let’s use the social media industry as an example, since this is where the FTC just lost their case. Remember, the customers here are not actually the Facebook users (consumers in the traditional sense), they are the advertisers trying to reach Facebook’s users. Most social media works this way: users get the “product” for free at the cost of having ads targeted to them based on the data the platform collects about them. This creates certain market dynamics. The more users a platform has, the more advertisers it can attract. The more advertisers, the more revenue. The more revenue, the more the company itself can market and advertise, attracting more users. If the company is successful, this eventually creates a “virtuous cycle” that keeps growth moving inexorably up and to the right. But remember: the growth that really matters is advertising growth: how many ads can the company sell. User base is a proxy for this, and so all the differentiation that companies engage in is analogous to how TV networks compete for viewers: they either zero in on a particular demographic or try to produce the most broadly appealing content in order to convince advertisers that their network is worth spending money on. Social media networks are in a similar game. The big ones have broad appeal that let advertisers target almost any demographic (like a major TV network), but others have more niche appeals that also attract particular types of advertisers (like HGTV or the History Channel). And a big part of their appeal, in each case, is the data they collect and the way it allows advertisers to target particular subsets of their user base.
So what would a tax do? Well, in theory it would do four things:
  1. It would encourage companies to limit the data they collect (and thereby reduce their tax burden), improving user privacy. The incentive would become to only hold onto the data that was (a) absolutely necessary and (b) provided the most valuable signals for the platform’s algorithms. Extraneous data would be shed to minimize the tax bill. Though I’m not a data scientist at a large data-driven company like Facebook or Google, my sense is that a large percentage of the data they hold on individual users provides at most a very small improvement to their algorithms. Cambridge Analytica used to boast that they could profile a user with only about 4-6 “likes” from their Facebook feed. If you can do so much with so little data, and data suddenly becomes an expensive commodity to hold onto, my guess is that most of these companies will shift from a model of hoovering up as much as possible to being very selective in the data they hold based on what they model to be the most valuable.
  2. It would open up a new avenue for competition in the advertiser-driven social media ecosystem by creating “data blind-spots” due to tax-driven self-limitations. Those blind spots would become a new way for rivals to differentiate in their appeals to advertisers (remember: they are the “real” customer in this industry). This would make it easier for competitors to break through and harder for incumbents with large user bases to defend their positions (because augmenting their data collection would be a much more expensive prospect for a platform with an already large user base).
  3. Another benefit is likely to be making the algorithms driving these platforms more transparent. If platforms start shedding all but the most important (from an algorithmic perspective) data, we’ll start to get a much clearer sense of what the most important factors in the algorithm are, which could provide valuable insights into how these companies impact the public discourse in our society. This will feed back into point 2 (above) by helping nascent startups understand what data is “valuable” and what data is being overlooked.
  4. Finally, taxing data may drive companies to explore other business models. If most of the data you hold on users is actually generated by those users (ie, social media posts, likes, etc), you’re going to at least strongly consider passing on the cost of this tax to them in the form of a subscription model. But users won’t subscribe to something they’ve been getting for free unless there is some new benefit, like an ad-free version. Once we go down this road of subscription-based social media/search/etc, the marketplace will be opened up to a whole new set of competitive pressures. If users are paying for a service, they are much more likely to be interested in switching to a service that offers a better product or a better value.
How Would it Work?
First, we have to define data. I propose the following: information collected or stored about a user that does not fit one of a small number of exceptions, including: (a) health information that is only used for treatment purposes, (b) financial information that is only used for processing transactions and record keeping (ie, not for recommending new products), and © data stored by the user but not readable or accessible by the service provider (ie, private files stored in a drive or email service). To illustrate the difference: the ledger your bank maintains of your credit card transactions isn’t “taxable data” if the bank only uses it to show you a record of your transactions. If the bank is also using that ledger to suggest new products and services to you (whether those be their own or a third party’s), its now taxable. Likewise, Amazon’s order history, which is used to recommend new items you might like, would be taxable data, but if, say, they segregate out your billing and shipping address data and don’t use that for any part of their recommendations algorithms or advertiser content it can be excluded from the taxable total.
Second, how would we measure it? I think the easiest way would be to tax it “per Gigabyte” on the total amount held on all US persons. My back-of-the napkin math (based on the amount of data Facebook holds and the percentage of their user base in the US) suggests that a $0.25 tax per GB of data held on US persons would cost Facebook around $5.5 Billion a year. For a company with $29 Billion in net income last year, that is a not insignificant tax burden and would prompt some serious change in behavior.
Third, how often would we assess/collect it? This gets into enforcement: we don’t want to create a system where companies can collect data all year, then dump everything except the minimum on Dec 31st and reduce their tax bill. So the “data” in question needs to be based on an “average” over some period. My suggestion would be a monthly average, with companies required to pay 1/12 of the annual tax rate x the average total amount of data held on US persons during the preceding month (where “total” = the highest volume/amount on each day of the month, not the amount at a particular calendar day/time).
Monetization Limits
One potentially negative externality of a “data tax” that we want to avoid is the “most valuable” data that companies hold onto also being the most personal data that we actually don’t want them to track. If Amazon wants to use my past purchases on Amazon to recommend other things that I might be interested in, that’s actually somewhat useful and not that creepy (I did buy it from them, after all). I’d have no objection if a local shop I frequent said to me “hey, you got this thing last month, so I think you’d really love this new thing we just got in,” so why would we object to Amazon doing essentially the same thing? If, on the other hand, they start popping up ads like “we know you just got a tax refund of $XXX amount, here are some great things in that you can buy on that budget,” or “we know your kid just went on their first date, here’s some parenting books you might like,” that’s a lot more creepy, invasive, and unwanted. So coupled with the data tax, there should be some categories of data that simply are forbidden from being monetized (either by directly selling these “profiles” or by using/allowing this data to be used as part of targeting advertisements, recommendations, or personalizations). The categories I’ve chosen are based on the Universal Declaration of Human Rights, which lays out a number of categories of things that people have a unique right to privacy, security, an autonomy about. Here they are:
  1. Health Information, including not just formal health records but personal posts/statements about health matters and “inferences” about an individual’s health conditions based on other behaviors and statements.
  2. Financial/tax data and/or credit history. It’s one thing to put people into broad categories like “baby boomers who like to travel (and therefore have a large disposable income).” It’s another to say “based on the records we have, you’re net worth is precisely X and we recommend Y” or “targeting people who paid X in taxes last year” or “targeting individuals known/believed to be behind on X bill.”
  3. Information pertaining to an individual’s legal/criminal history. No ads specifically for ex-felons, individuals acquitted of such-and-such charge, or (likely) “illegal immigrants.”
  4. Information about the details of a persons family, home, or sexual activities. You can target broad categories like “families with kids at home” or “people who live in Omaha.” You can’t go to specifics like “has a 12-year old daughter” or “lives on Sullivan Street between 8th and 22nd” or “got laid in the last week.”
  5. Information posted with a reasonable expectation of privacy, like emails, direct messages, or “private” chats. Things that aren’t posted for public consumption can’t be used for targeting someone with ads or recommendations.
  6. Information that in a user’s home country or present location could be used to target them for discrimination or persecution. For example, in places where there are laws allowing LGBTQ+ individuals to be arrested (or worse), you can’t allow people to use LGBTQ+ status as a targeting parameter.
The goal here is to drive the data that is used for targeting, recommendations, etc. towards things like “your interests, hobbies, music likes, movie choices, and recent purchases.” While these things aren’t “public” information, they are more in the realm of things that people “expect” to share with business they interact with and which people naturally understand to be part of those platforms “enhancing” or “personalizing” their experience. Things that people expect to be private or semi-private should be off limits to monetization and personalization algorithms. Hold onto (and pay taxes on) that data if you wish, but you can’t use it in your algorithms.
Overall, the goal of this approach is three-fold: first, tame the hunger-cravings that large internet platforms have for personal data by making that data an expensive commodity to hold. Second, increase competition by (a) improving the transparency around internet platform algorithms and (b) creating new paths for upstarts to differentiate themselves (whether to advertisers or paying consumers). And third, improve user privacy and trust for the system by limiting what aspects of our lives can be monetized for advertising and “personalization” purposes. The combination should result in a more competitive environment with a healthier relationship between consumers and platforms.
Did you enjoy this issue?
The Soren Review

News, analysis, and opinion on tech policy, governance, security, and economics.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue