PI response to ICO consultation on web scraping by generative AI

PI responded to the ICO consultation on the legality of web scraping by AI developers when producing generative AI models such as LLMs. Developers are known to scrape enormous amounts of data from the web in order to train their models on different types of human-generated content. But data collection by AI web-scrapers can be indiscriminate and the outputs of generative AI models can be unpredictable and potentially harmful.

Advocacy

Post date

5th March 2024

An Electric Control Cabinet Production Factory

Photo by İsmail Enes Ayhan on Unsplash

Generative AI models are based on indiscriminate and potentially harmful data scraping

Existing and emergent practices of web-scraping for AI is rife with problems. We are not convinced it stands up to the scrutiny and standards expected by existing law. If the balance is got wrong here, then people stand to have their right to privacy further violated by new technologies.

The approach taken by the ICO towards web scraping for generative AI models may therefore have important downstream repercussions for the future of people’s information rights online.

Our response to the consultation discusses in more detail the following three matters:

The risks of an overly permissive approach to the “legitimate interests” test leaving the door wide open for personal data to be misused or abused in the future;
The barriers to exercising information rights in the context of “invisible processing” activities like web scraping; and
The potential benefit of a public registry system for generative AI models.

Download our full response to the consultation below.

Attachments

PI response - ICO Consultation on web scraping and Gen AI (submitted).pdf

Learn more

Artificial Intelligence

Data Intensive Systems

Data Protection

Our fight

Challenging Corporate Data Exploitation

What PI is calling for

PI response to ICO consultation on web scraping by generative AI

Generative AI models are based on indiscriminate and potentially harmful data scraping

Companies and industries protect privacy by design, not exploit people and their data.

Technologies, laws, and policies contain modern safeguards to protect people from exploitation.

Limit data analysis by design

PI response to ICO consultation on web scraping by generative AI

Generative AI models are based on indiscriminate and potentially harmful data scraping

Companies and industries protect privacy by design, not exploit people and their data.

Technologies, laws, and policies contain modern safeguards to protect people from exploitation.

Limit data analysis by design

Related Content

PI's recommendations to amend draft instrument on private military and security companies

The Anthropic and US Government conflict is larger than you think

End of the Line for Windows 10?

Risks in turning AI chatbots into AI agents... and using MCP