How do data companies get our data?

Long Read
How do data companies get our data?

We found the image here.

Open a Russian Matryoshka doll and you will find a smaller doll inside. Ask a large data company such as Acxiom and Oracle where they get their data from, and the answer will be from smaller data companies.

Data companies – a catch all term for data brokers, advertisers, marketers, web trackers, and more – facilitate a hidden data ecosystem that collects, generates and supplies data to wide variety of beneficiaries. The beneficiaries of the ecosystem can include other advertisers, social media sites, credit agencies, insurers, law enforcement, and more. But what is rarely talked about, is from where these data companies obtain the data, how further data is generated and how the data is swapped, sold, and shared within the ecosystem.

Privacy International recently launched a campaign to shine a light on the hidden data ecosystem, and has begun by looking at a selection of large data companies that publicly list their data sources.

The Russian Matryoshka Doll Effect: data companies get their data from data companies who get their data from data companies

It is no surprise that the data industry appears fluctuant and sees frequent acquisitions. Data companies such as Acxiom and Oracle have on their websites lists featuring various data sources. However, these lists tend to be lists of hundreds of smaller data companies such as AddThis and Ziff Davis. And the deeper you go the more data companies appear.

For example, one way Acxiom to obtains data on British citizens is through a company called Read Group, which prides itself on having a data list that is “the most comprehensive view of active online and offline UK consumers containing over 32 million individuals. It combines transactional history, lifestyle choices, behavioural insights and geo-demographics to help target your campaigns at every level”.

While it appears that the most common way for companies to obtain their data is to buy lists from other companies, the original question still stands: where does the data originally come from?


Electoral register, open registry, census: how the state feeds data companies

A major source of data for data companies does not come from the commercial sector but from the state itself, and very traditional forms of data. For example in the UK, the data broker Acxiom lists HM Land Registry, the Office for National Statistics (which conducts population censuses), the Department of Business, Energy and Industrial Strategy, and Ofcom as data sources.

Experian - widely known as a credit scoring agency - is also a data company that follows an identical model. When it comes to the data collected for marketing purposes in the US, Experian cites “information from the United States Census” and “the phone book”.


Cookies and web beacons: how data companies track you across the web

Most websites include embedded code and images that collect data about who we are, what we’re reading and what we’re interested in. While technique vary, many co called “third party trackers” rely on both cookies and "web beacons" also known as pixels. A data company like Quantcast has embedded such trackers in a network of millions of websites sites so that every time a user visit one of those website, they are identified and their browsing history recorded. This is why when you look at a pair of shoes on a website you may find an ad for that very same pair “following you” across different websites. Browsing histories are also used to create profiles, and to derive users’ identities, interests and much more. The activist group Tactical Tech has mapped online tracking across websites in news, governments and politics, finance, health and society.


E-Mail Tracking

According to a study published in 2016, over 40 percent of all emails sent around the world daily are being tracked. Here’s how it works: “[t]racking clients embed a line of code in the body of an email – usually in a 1x1 pixel image, so tiny it’s invisible, but also in elements like hyperlinks and custom fonts”. Through E-Mail tracking companies don’t just learn that a recipient has opened an email, but also where it was opened and on what device. Newsletter services, marketers and advertisers have used this technique for years; now big tech like Twitter and Facebook are following suit.


Apps and third party trackers

Research has shown that more than three in four Android apps contain at least on third-party tracker. Third-party app analytics companies plan a crucial role for advertisers and app developers. Though some are used to better understand how users use apps, a vast majority are used for targeted advertising, behavioural analytics, and location tracking. The problem is, that there is no actual opting-out, when it comes to such third-party tracking.

In addition to third party trackers embedded in apps, apps themselves frequently access users’ entire address books, location data, photos and more, sometimes even if you have explicitly turned off access to such data.

When your favourite shops give away your data

Another important way for companies to obtain data is to obtain it from the companies you interact with: the places you shop, the services you use, etc. Many of the data companies offer advertisement and marketing services, and assist consumer-facing companies to find new customers or better target their existing customers. However, in order to provide such services these companies need customers’ data. When agreeing to privacy policies of websites you are often told your data may be shared with trusted third parties, and often the trusted third-party will be a data company.


Platform registration: registering for more than you intend

Another way data companies obtain your data is through website registrations. More and more websites now expect you to register to access the website’s content. Some websites appear to have been created largely for the purposes of obtaining data.

For example, among their listed data sources, Acxiom mentioned two websites, and Emma’s Diary. Both websites target future parents and offer information and discounts to their members, all the while collecting data on those who interact with the websites.


Personality tests, quizzes, surveys and prizes: the data baits

Another major source of data for data companies are surveys – this was at the heart of the 2018 Cambridge Analytica scandal. This includes things such as personality quizzes, online games and tests, and more. When a company asks you to rate a product, your opinion may benefit many other companies. The data company Epsilon for instance has created a database called Shopper’s Voice boasting “unique insights you won’t find anywhere else, directly from tens of millions of consumers. Our proprietary survey database spans 1,000 data points, including product preferences and purchase behaviours”.

In order to lure customers into replying to their survey, Shopper’s Voice offers multiple rewards: “instant flash savings, tailor-made free offers, and a chance to win $1,500”. Prizes and competitions are in fact another way for data companies to obtain data. Every time you enter a competition or prize draw, the real winner might be the company collecting the participants’ data. Acxiom for instance mentions as a source of data.


Financial companies: the dual game

We highlighted before the ambiguous role of credit scoring companies, whose role it is to collect data in order to assess people’s credit worthiness but who also engage in marketing activities. In the US, Experian claims that they keep marketing data and credit scoring separated: “No individualized data from Experian’s regulated credit reporting database is ever used in any OmniActivation Strategic Services marketing activities”.

In the UK, the situation is different. The marketing arm is allowed to use names and addresses from the credit scoring databases for two purposes: “to validate existing marketing names and addresses and to ensure that, as far as possible, marketing contact names and addresses belong to an individual aged 18+” and “for matching and linking our modelled data to a client’s existing customer base where we only provide this data in a non-readable encrypted format that can only be used for data matching”.

Debit card and credit card companies also play an ambiguous game when it comes to your data. While they do not reveal your personal transaction (the data is aggregated), MasterCard constitutes a goldmine for a data company like Quantcast. The company boasts that their data is “derived from billions of transactions and applied to third-party consumer populations. A proprietary MasterCard methodology identifies audience segments with higher statistical probability to make purchases within the category”. Whether it is offline or online purchases, using MasterCard as a form of payment means you are feeding into “audience profiles informed by activity on more than 2.2 billion payment cards and 43 billion transactions annually”.

Offline data and cross-device identification: the full picture 

Many shops, as well as airports and transport systems, want to be able to track their customers movements once inside the terminal, on the transport or in the shop. Companies are making use of WiFi, Bluetooth, and in some instances, ultrasonic sound inaudible to the human ear, in order to track users’ geolocations in real time

For both offline and online data, one of the obstacles for advertisers is to identify a single individual across several devices, as most people will use at least a smart phone and a computer – if not several. To avoid using ‘personally identifiable information’ as its termed in the US, data companies tend to stay away from real names but instead assign people a unique ID. The ability to accurately identify individuals has become a key marketing argument for these companies.


Looking forward

Data companies rely on various sources – from government data to tracking cookies and geolocation – making it virtually impossible to escape their net. Yet, most of the companies we have mentioned in this piece remain unknown to most people. Marketers and credit bureaus have always collected, analysed and compiled lots of data about people: from loyalty programs, to consumer credit reporting. Now in addition, the drive to create ever more targeted ads has created an entire ecosystem made up of thousands of companies that are all in the business of tracking and profiling people in virtually all aspects of their lives. The problem is: today’s pervasive digital tracking is mostly invisible.

How can we hold companies accountable when their very existence is unknown to us? This is why it is essential for the general public become aware of this industry that has thrived in the shadows. GDPR provides civil society, individuals, and media one tool by which to hold these companies and ecosystem to account.