(Sort of) Trust but Verify: Palantir Responds to Questions about its work with NHS

Palantir, the US data giant which works with intelligence and immigration enforcement agencies, has responded to our questions about its work on a highly sensitive National Health Service (NHS) project, providing some assurances, passing the buck to the NHS, and raising additional questions.

Key findings
  • In April 2020, it was reported that Palantir, the US-based data-mining company, would be involved in a Covid-19 data project with the UK National Health Service (NHS).
  • As a result, Privacy International, Big Brother Watch, medConfidential, Foxglove, and Open Rights Group sent Palantir 10 questions about their work on the project to seek clarification and assurances.
  • Their response offers some assurances but fails to clarify the extent of the project and what protections exist.
  • We will continue to monitor the situation closely and make sure that both Palantir and the UK government are held to account.
Long Read

On 12 April 2020, citing confidential documents, the Guardian reported Palantir would be involved in a Covid-19 data project which "includes large volumes of data pertaining to individuals, including protected health information, Covid-19 test results, the contents of people’s calls to the NHS health advice line 111 and clinical information about those in intensive care".
It cited a Whitehall source "alarmed at the “unprecedented” amounts of confidential health information being swept up in the project, which they said was progressing at alarming speed and with insufficient regard for privacy, ethics or data protection".
As a result, Privacy International, Big Brother Watch, medConfidential, Foxglove, and Open Rights Group sent Palantir 10 questions about their work on the project to seek clarification and assurances.
Their response, attached below in full, offers some assurances but fails to clarify the extent of the project and what protections exist.
Given that Palantir is unable to release further information, it is now up to the Health Secretary to release any impact assessment and agreements in place to enable public trust and verification.
The NHS Covid-19 datastore
Palantir’s role in the project involves integrating NHS datasets with the US company’s data-management platform, Foundry.
Palantir claim that Foundry is a software that allows users "source, connect, and transform data into any shape they desire, then use it to take action".
As Palantir put it in a recent blog, Foundry helps "organizations map, understand, and operationalize their data, so they can use that data to make informed, timely decisions" to deal with the pandemic.
Our questions related to their access to this highly sensitive data set, and what protections there are in place against misuse.
Here's our key takeaways from their response (full response attached below):
First, Palantir notes that their role in this exercise is that of a data processor and that the company "serves as a technical agent to its customers, providing software and services to enable and support them in analysing the data they control".
This means that, under data protection laws, Palantir merely processes the data under the direction and guidance of the data controller, in this case the NHS, which would be the one maintaining control over the data and deciding how it should be processed.
In their response, Palantir do not rule out the possibility that they might still obtain access to confidential NHS patient data:

any access to customer data under any circumstances would be strictly at the direction of customers, in support of legitimate purposes, and in adherence with all applicable rules and regulations.

However, they do not clarify whether the company would obtain access to any sensitive health data held by the NHS such as patient records. Instead, they direct us to the NHS to answer this question.
Second, while the NHS Covid-19 datastore website mentions that 111 and 999 call data is aggregate, the Guardian reports that:

While anonymised, confidential 111 call information in the Covid-19 datastore may include people’s gender, postcode, symptoms, the mechanism through which any prescription was dispatched to them, and the precise time they ended the call.

The project appears to be using a “pseudo NHS number” to cross-match large datasets, including a master patient index, an existing NHS resource that uses “social marketing data” to segment the British population into different “types” at household level.

Even if anonymised or pseudo-anonymous datasets were among the ones used to facilitate the NHS datastore, both Palantir and the UK government need to be very clear about the anonymisation or pseudo-anonymisation techniques they are using.
There is a fine line between pseudo-anonymous and anonymised data. The first can still render an individual identifiable. For example, journalists from the German public broadcaster NDR were able to identify the sexual preference and medical history of judges and politicians, using online identifiers. This is just one example, that serves to illustrate the insights that can be gleaned from seemingly mundane and pseudonymous data and the value it might have.
Even if it is not a company’s intention to directly identify an individual, this is still possible, due to the vast amount of data it might collect and generate. And, even when data appears to be truly anonymised by companies, and consequently exempt from the protection guaranteed by the General Data Protection Regulation, for example, this anonymisation might still lead to the re-identification of individuals.
In 2015, researchers at Harvard University found vulnerabilities in the anonymisation procedures used for health care data in South Korea that enabled them to de-anonymise patients with a 100% success rate and to decrypt the Resident Registration Numbers. The unique 13-digit codes enabled full re-identification. In the UK, medical information that is held on the NHS Personal Demographics Service (PDS) is identified by the patient's ten-digit NHS number.
In the UK, Cambridge University security engineer Ross Anderson noted that the problem is that 800,000 NHS employees need access to the PDS; Hampshire GP Neil Bhatia agreed that the large number of users means that access can't be audited or controlled and relies on trust.
Similarly, in a more recent study published in Nature, researchers were able to demonstrate that, despite the anonymisation techniques applied, “data can often be reverse engineered using machine learning to re-identify individuals.”
Third, with regards to our question whether Palantir has similar collaborations with health services in other countries, and, if so, what these countries are, Palantir said they "are supporting a range of public and private sector organisations in their response to the Covid-19 crisis". They direct us to their website, which does not however provide any specific information about similar projects in other countries. At the same time, Palantir keep emphasising that it is their customers, whose directions they are acting under, "in support of legitimate purposes, and in adherence with all applicable rules and regulations".
Bloomberg reports that "a dozen governments joined the U.S. and U.K. in adopting Palantir software for their fights against the deadly virus".
It is important to note that while the NHS/UK has a data protection framework, not all countries do. In the end, even if it is not Palantir's decision what data to put into the system, it still bears a responsibility to ensure that its software won't be used as a tool to legitimise mass surveillance for "legitimate purposes". If not, do they want to be an accomplice in a totalitarian nightmare?
Not being able to disclose similar collaborations in other countries raises transparency concerns, especially in light of recent reports that Palantir in fact does have those and are negotiating more.
Passing the buck to the NHS is not the answer. And, as PI revealed in December 2019 regarding similar deals between the UK Department of Health and Amazon, such kind of deals might eventually put both the NHS and people in the UK at data exploitation risks.
Palantir's welcome assurances must be verified via the company and the government once the pandemic is over. This is the only way we can achieve proper oversight, ensure respect for sensitive patient data and strongly reject any actor that seeks to turn a public health crisis into an opportunistic power grab. In the meantime, we will keep watching them closely.