Limit data analysis by design

As nearly every human interaction now generates some form of data, systems should be designed to limit the invasiveness of data analysis by all parties in the transaction and networking.

Industry is gaining insights into and intelligence on our lives that were previously possessed by powerful Intelligence Agencies, and tomorrow their potential may exceed them. In the future, industry giants will have more insight into the world than the most powerful intelligence agencies. What they know and represent about us will have significant effects on individuals, groups, and whole societies.

What's the problem?

As a result of design choices in modern technologies, individual and collective behaviour is increasingly traceable. Metadata and logs, and other forms of observed data are generated of every interaction. The growing stores of data that companies and governments hold about individuals and groups is now automatically generated from human behaviour. This is at odds with how most users understand privacy as being about what they knowingly and overtly disclose to companies.

Powerful institutions with access to data now have unprecedented population-level knowledge about individuals, groups, communities, and whole nations and markets. With this knowledge they will have insight and intelligence on patterns of behaviour and other trends. They may identify customary behaviours and activities, as well as deviations. Even as these categories become divorced from the individual pieces of personal data, they provide powerful insights into how groups, societies and markets function. And they will likely be kept secret or understandable to the few. While monopolies are traditionally measured in terms of market power, this raises the question of how the data economy needs new ways to measure what qualifies as dominance in the marketplace.

Why this matters

In the future, industry giants will have more insight into the world than the most powerful intelligence agencies. What they know and represent about us will have significant effects on individuals, groups, and whole societies.

What we would like to see

We should be able to know the metadata and other observed and derived data that is generated through our interactions, and where this data leaks to and who has access, e.g. in WhatsApp and SMS or financial transactions, what does which provider have access to and what does that allow them to infer?

Individuals are asked for their consent when their data is to be used to generate analytics for purposes beyond their own direct advantage and legitimate interest, even if the data that is taken from their use is de-identified or anonymised.

Individuals should be able to filter-out metadata and other observed data and prevent processing on platforms, e.g. removing photo metadata and processing on platforms unless an individual wishes for metadata to be disclosed.

Where systems are de facto compulsory and it is impossible for individuals to object, they should be able to be pseudonymous and they must be able to represent themselves as is in their interests, which in inflexible systems would mean the ability to lie and fabricate data. The exception to this would be when systems have a legitimate and specific purpose, as those systems would minimise data, which should be the default.

Even as the user-interface of devices and services disappears and background processing takes more data than under knowledgeable consent, we need more transparency of the data processing on devices and the data emerging from devices and services. Just as firewalls are able to identify and interfere with flows of data from computers, we want to see innovations that give individuals controls over data emerging from other technologies, whether from scripts running on websites to IoT devices calling home.

What this will mean

Competition law would have to consider dominance of a company through the knowledge is has on individuals activities, intelligence and insight it possesses on individuals and groups and whole societies, and the choices it made through the design of systems.

User-generated content systems will permit users to control data disclosure, including through the restraint and even fabrication of observed data. A location tracking and communications tool should allow the user to mis-represent their location to others, instead of relying only on system-generated GPS data – and exceptions to this rule must be clear, e.g. gaming. By design these systems must allow for reduction of observed, derived and inferable data, e.g. in photos and posts.

Platforms should limit ability of third parties to conduct unlawful surveillance, and these third parties should not be able to collect personal data (e.g. photo location) except when necessary and proportionate to a legitimate aim. They should also inform users what data is accessible to third parties, how, and under what circumstances.

Essential reform actions

Regulators will need to broaden their remits around data, intelligence, and power, e.g. competition regulators need to reflect upon data, data protection regulators need to increase the scope of their work to consider analytics, anonymous data, group privacy.

Stronger controls on social media to prevent the generation of SOCMINT, and stronger rules on access by third parties.