The Princeton Web Census: a 1-million-site measurement and analysis of web privacy
Princeton University's WebTap - Web Transparency and Accountability - project conducts a monthly automated census of 1 million websites to measure tracking and privacy. The census detects and measures many or most of the known privacy violations researchers have found in the past: circumvention of cookie blocking, leakage of personally identifiable information to third parties, Canvas fingerprinting, and many more. The research also examines the effect of browser privacy tools and cookie syncing, a complex redirection technique third parties provide to advertisers to reidentify users as they move around the web.
In January 2016, the measurement tool made over 90 million requests and assembled the largest known dataset, "OpenWPM" for studying web tracking. The researchers have published this as open data for others to study. Among their own findings from this dataset: the total number of third-parties present on at least two sites is over 81,0000, but only 123 of these are present on more than 1% of sites; in other words, the dataset shows a strong "long tail" distribution. All of the top five third parties and 12 of the top 20 are domains owned by Google. Google, Facebook, and Twitter are the only third-party entities present on more than 10% of sites.
Writer: Steven Englehardt