DataScan: Issue #60



January 7 · Issue #60 · View online

Curated digest on the world of data.

Let’s make this the year we reclaim control of our data. The simple process of re-identification. Europe’s data rule shake-up. Azeem Azhar’s 18 exponential changes we can expect this year.
Let’s make this the year we reclaim control of our data. Excellent piece by Jon Crowcroft, Marconi professor of communications systems at the University of Cambridge Computer Laboratory, on how differential privacy, homomorphic encryption, and GDPR could “help consumers wrestle back control of their personal information”. 🔮  
Crowcroft points out the conundrum between privacy and progress, as “it has been hard to find a balance between anonymising data so that it protects individuals’ confidentiality and maintaining its usefulness”. 🤔Differential privacy, however, involves “putting an "envelope” around data, and only allowing access to what is revealed by inputting specific search queries". 💯  Therefore:
You could, for example, find out how many people in a dataset live in a certain postcode, but without getting access to the identities of the individuals who do so.Differential privacy works by filtering data, fuzzing certain features of it, or analysing and blocking intrusive queries. Given that most market-research-style analytics are concerned with identifying groups in the data, this may not impact on its usefulness.
Furthermore, Crowcroft discusses the idea of protecting data by “just leaving it where it is” 💡 - rather than moving and centralising it in a pool:
Instead of moving it to a single place where it might be leaked, this “edge cloud” approach leaves data in people’s devices and distributes the programmes that do the analytics. This moves the results to businesses that wish to use them, but without ever moving the raw personal data. And, since there would be no central data or cloud storage, there’s no need for the provider to cover its cost.
The simple process of re-identifying patients in public health records. 🙄Dr Vanessa Teague, Dr Chris Culnane and Dr Benjamin Rubinstein, University of Melbourne, explain the problem of personal medical data being re-identified so easily and the “risky balance” between data sharing and privacy:
While the ambition of making more data more easily available to facilitate research, innovation and sound public policy is a good one, there is an important technical and procedural problem to solve: there is no good solution for publishing sensitive complex individual records that protects privacy without substantially degrading the usefulness of the data. 
Nice ZDNet write-up on how this is a symptom of deeper problems.
Europe’s data rule shake-up: How companies are dealing with it. Writing for the Financial Times, Aliya Ram and Hannah Kuchler explore what actions businesses are taking to prepare for the General Data Protection Regulation (GDPR). 🔍 The key changes will be around consent, hacks/ data breaches and the “right to be forgotten”. Interestingly, the UK has the highest number of data workers in Europe, according to JP Morgan. – Check this innovative approach to refreshing consent. ⚽
Personal data of a billion Indians sold online. According to an investigation by Indian newspaper The Tribune, it is possible to purchase access to anyone’s personal information held in India’s biometric database, Aadhaar, for as little as £6. This includes “a person’s name, home and email addresses, photographs and phone numbers”. 🙀 
The Unique Identification Authority of India (UIDAI), which administers the Aadhaar system, claims that only demographic information has been accessed, which “cannot be misused without biometrics”. ⛔  – Critics of the system have repeatedly argued that such an “enormous and potentially lucrative database can never be fully secured.” ⚡
– Chilling insight into China’s upcoming social-credit system, which “blacklisted” journalist Liu Hu after being arrested for “fabricating and spreading rumours”. As a consequence, he was then unable to “buy property, take out a loan or travel on the country’s top-tier trains”. 😳
Let’s use data for good. Writing for Campaign, Marie Stafford discusses how data philanthropy provides an opportunity for businesses to “deliver meaningful impact” for the “greater good, not just for profit”. ✅  For example, businesses can get involved by “donating data, supplying expertise or sharing technology solutions”. Whatsmore:
JWT research reveals that 50% of people in the UK are more likely to give their business to brands or companies that share data to try to tackle big challenges facing society.
🚀  Azeem Azhar’s 18 exponential changes we can expect this year.
👻  Get up to speed on Meltdown and Spectre
🤖  AI and deep learning in 2017 – a year in review.
🙌  Using AI and open data for innovation and accountability.
💯  Re-recording of Alan Turing’s “Can Computers Think?” broadcasts. 
🗺️  Google Map’s Moat.  
🧀  Average search interest in “cheese” and “kale” in the UK with time.
🌲  Solar panel analysis - Jeroen Boeye visualised the trees around his house:
