Computers vs Humans: Why There's No Difference Between Who, Or What, Looks At Your Data

News & Analysis

A year after the Eyes Wide Open program was launched here at Privacy International, we are just beginning to scratch the surface of the processes and justifications that agencies like GCHQ use to make their spying legally compliant. Tocqueville, a great philosopher of law stated:

“If they prize freedom much, they generally value legality still more. They are less afraid of tyranny than of arbitrary power, and provided the legislature undertakes of itself to deprive men of their independence, they are not dissatisfied.”

In the last ten years mass surveillance programs have flourished, scooping up any private communications that pass a Five Eyes collection point – in carelessness and greed they swell the haystacks to get to the needles. Massive amounts of communications data is sifted and filtered with no meaningful proportionality.

Meanwhile they only minimise this intrusion as a lip-service, insisting that only computers examine the data, and point instead to compliance frameworks that bind intelligence officers. However, the Snowden leaks and ACLU FOI litigation show that at the desks of the surveillance officers, these legal procedures amount to very little.


A Flawed Starting Point

Governments have long sought to obscure how much data they actually collect. Since Snowden's disclosures, they have defended their actions claiming the interference with privacy is not at the point of collection, not at the gathering of communications itself, but at the act of a person actually looking at that data.

In doing so, governments are effectively denying that an interference with the right to privacy has taken place at all for most of the communications intercepted in mass-surveillance programs.

The current restrictions and safeguards implemented by the officers themselves leave excessive leeway for suspicionless surveillance and untargeted collection with substantial error margins. In fact at the very first step of the surveillance process, the step that defines how much data will be collected, we know of very few mechanisms to restrict the targeting of the surveillance officer.

The only attempt at narrowing searches from the start, and protecting the public from untargeted trawling, that we know of is the US 'Reasonable Articulable Suspicion' (RAS). However, even the US courts and the legal representatives of the US intelligence agencies are all in open accord – this is the weakest possible protection. In fact in an NSA training guide, the RAS is graded on the US scale of legal definitions of suspicion:

[A] jury... will not convict an accused unless the evidence of guilt is” beyond a reasonable doubt. “This is the highest legal standard of proof... Lower still is the standard of proof... of a search warrant –” probable cause” – whether that search warrant is for the suspect’s home or the content of the suspect’s communications. The RAS standard falls below” probable cause.”

In 2009, the Foreign Intelligence Surveillance Court learned that for two-and-a-half years NSA had searched phone meta-data using an “alert list” of possible terrorists phone numbers originally created for other purposes. Almost 90 per cent of the numbers on the alert list did not meet the “reasonable, articulable suspicion” standard. Judge Walton observed that the RAS had been “so frequently and systematically violated that it can fairly be said that this critical element of the overall... regime has never functioned effectively.”

Weak and ineffective as the RAS may be, we haven't seen any attempt at applying similar protection in the UK or elsewhere. In the UK, Section 8(4) of RIPA exempts the GCHQ from describing the persons, interception subjects and the premises for surveillance in a warrant when the communications are external.

A leaked NSA presentation shows clearly just how flawed modern mass surveillance is. In an attempt to develop identifier leads, the presentation lays bare the stages of processing, developing leads, or suspect lists, through data interception passes through four phases of analysis and processing. Once collected, there is no limit to how much the data can be analysed with the only new protections and justifications being carried out at the very last stage before it is viewed by human eyes. Both collection and enrichment analyses are carried out with no justifications at all.

Justifying mass surveillance has become as easy as getting signed off on broad statutory authority for the collection and asking analysts to select a threat scenario from a ready-made drop-down menu every time they want to dive into the data pool.


An Inaccurate Process

The private communications of everyone caught in this net are subjected to experimentation in an attempt to determine how suspicious they are. Emails and phone calls with friends and family, internet searches on medical conditions, and web-browsing showing political views will all be examined and graded.

So, how intrusive is the final stage, before human eyes look at the individual communications? How much processing is done to the communications of people who are not suspected for any criminal activity? The figures say it all.

For analysts, just setting out identifiers will yield between 50 and 10,000 leads through the initial three stages of surveillance. One training document shows that at that stage, only 5 per cent of the communications are of definite interest to the agency, 20 per cent are definitely irrelevant and the rest is of uncertain relevance. In the NSAs presenters own words “the analyst is often left with too many identifiers of “possible interest.”

Today, asymmetrical privacy protections based on origin mean it is common for intelligence agencies to differentiate in standards of safeguards between the communications of citizens and foreigners. However, after generating the lead lists and intercepting the communications, the computer systems are actually incapable of distinguishing the origins. In result agencies like the GCHQ and NSA, mandated only to collect foreign intelligence happily trawl broad swathes of domestic communications as well.

The origin, or foreignness, is only determined at the third stage of the surveillance of the process. Prior to that point the data is readily analysed (albeit not manually) and enriched in preparation for the surveillance officer. As much as 49 per cent of that analysed data may in fact be domestic.

The problem is striking both in the US and UK. One GCHQ analyst telling the Guardian “I was told that we were getting 85 per cent of all UK domestic traffic – voice, internet, all of it." A US judge recently estimated that as many as 46,000 US communications are collected annually as 'foreign' intelligence collection. We'd presume that to be a very careful estimate.


Where does all this leave the surveillance officer – where does it leave us?

So, if the Governments expect us concede that its only human eyes and not computers that really pose a substantial interference with privacy, perhaps they'd also care to explain how that leaves us any better off?

Firstly, the computer analytics are, in fact, not of a lower relevance to the right to privacy. Even before the stage of actually determining whether an intercepted communication subjected to computer analysis is external or internal, the computer has already provided the agency with a list of private information. This could entail personal Skype accounts, e-mail, accounts and with it access to everything we hold dear about our private life.

But what about the information the computer passes on to the officer to view in person? The truth is that although the computer generates lists of highly private information, the error margins are scarcely better when passed on to a human. Not only are the inaccuracies in origin determination persistent, even when passed on to a human, the relevance of the overall data pool remains appallingly low.

UK and US officials both openly admit that the computers are incapable of effectively differentiating the origin of communications We'd have to assume that problem persists as the information is handed over to a human. Of course, the UK Government thinks it can justify this by redefining the every-day use of Facebook, Twitter, YouTube and Google as “external” since the servers are located abroad. It also thinks that domestic communications associated with foreign communications likewise needs to be excepted from the standard legal protections.

By setting different legal standards in protecting citizens and foreigners and feigning that they don't collect domestic communications, Governments can pretend that there is no problem. It is a pretext for surveillance that results in discrimination based on nationality on a massive scale. If our findings show one thing about this distinction, it's that it's become an argument without substance used to exempt both citizens and non-citizens from the protections they deserve.

Their reasoning also highlights the current state of legal affairs as mere smokescreen legalisms that pay the most meagre lip-service to the protection of privacy. A virtually unrestricted starting point to develop leads generate massive error margins. As you've already seen, the data collected and enriched by the computers is mostly of no interest to the investigation. At the end, the same issues remain. The surveillance officer gains access into a data pool where only 20 per cent of the remaining data is of “high” or “definite” relevance to their initial search request.

So there we have it – 80 per cent of the data pried open by the eyes of the spy is not going to stop serious crime, it is not the data of a terrorist. It is the private life of ordinary citizens – managing their bank accounts, mailing friends, informing their political views and even sharing their love – online.

Learn more