AI, ChatGPT and the EDPB: Regulatory Insights from Taskforce Report
1. Introduction
On 23 May 2024, the European Data Protection Board ("EDPB") published a report ("Report") on the work undertaken by its ChatGPT Taskforce ("Taskforce"), comprising each EU supervisory authority ("SA"). The Report is the first official document from the EDPB setting out its 'preliminary views' regarding the interplay between the General Data Protection Regulation ("GDPR") and artificial intelligence ("AI"), including the EU AI Act.
While the Report focuses on the various (ongoing) SA GDPR investigations into OpenAI's ChatGPT large language model ("LLM"), it gives early indicators about the EDPB's direction of regulation regarding the co-existence of these two areas of law. It offers valuable insights for businesses acting as controllers of personal data, whether as a deployer or developer of an AI system.
The Report also includes a questionnaire which such controllers may consider useful when looking to ensure compliance with the GDPR (e.g. by way of a checklist).
2. Background to the Report
In November 2022, OpenAI OpCo, LLC ("OpenAI") launched its trailblazing ChatGPT LLM. ChatGPT rapidly grew, amassing a reported 1 million users in just 5 days from being made available and now has around 180.5 million users.
However, at the time of launching, ChatGPT did not meet the requirements of the GDPR. This led the Italian SA and several other SAs to initiate their own investigations under Article 58(1)(a) GDPR.
The Taskforce was established to: (a) foster coordination; (b) exchange information; and (c) establish a common approach to the SAs' investigations into ChatGPT. At the time, OpenAI had no establishment in the EU, meaning the GDPR's "one-stop-shop" consistency mechanism did not apply.
On 15 February 2024, OpenAI established OpenAI Ireland Limited, making the Data Protection Commission of Ireland its lead SA for cross-border investigations and complaints under the GDPR. Nonetheless, several SAs have open investigations (that pre-date February 2024), and which have yet to conclude.
The Report covers the period of investigations between November 2022 and February 2024.
3. What's in the Report? 'Preliminary views'
The Report outlines the EDPB's 'preliminary views' on key areas of the GDPR, namely: lawfulness, fairness, accuracy, transparency and data subject rights.
(1) Lawfulness
A fundamental aspect of the GDPR is that to process any personal data, a controller must have a legal basis to do so. This relates to the GDPR's principle of lawfulness to process personal data. In assessing the legal basis for processing, the Report breaks down the different stages of processing by ChatGPT's LLM during its lifecycle:
- Collection of training data; pre-processing of data; and training; and
- Input data/prompts and output data; and training ChatGPT with prompts.
Legitimate interests & appropriate safeguards
The Report focuses on OpenAI's reliance on the legal basis that processing is necessary for the purposes of its legitimate interests (Article 6(1)(f) GDPR) at the various stages above. The EDPB emphasises the associated compliance obligations on controllers to rely on this legal basis, reiterating the requirement to conduct a legitimate interest assessment documenting: (i) the identified legitimate interest(s); (ii) the necessity of pursuing the legitimate interest(s); and (iii) the balancing of individual rights with the interests of the controller.
Importantly, the Report outlines that adequate safeguards play a 'special role' in reducing the undue impact of processing on data subjects, thereby shifting the balancing test in favour of the controller.
The Report includes four examples of adequate safeguards, by way of privacy-enhancing techniques, for controllers to consider:
(i) defining precise collection criteria;
(ii) ensuring certain data categories are not collected;
(iii) ensuring certain sources are excluded from data collection (e.g. public social media profiles); and
(iv) measures to delete or anonymise personal data (including special category data) which have been collected via web-scraping before and after the training stage.
The Report acknowledges that, where large amounts of personal data are being collected, it is not possible to conduct a 'case-by-case examination' of each dataset. However, appropriate safeguards must be implemented to meet the requirements of the GDPR.
The Report notes that under the GDPR, the controller, here OpenAI, bears the burden of proof for demonstrating the effectiveness of the chosen measures.
(b) Data accuracy – hallucinations versus fact, users must understand the difference
The Report distinguishes between input data (prompts) and output data regarding the GDPR's data accuracy principle. It acknowledges that the purpose of processing for ChatGPT is not to provide accurate information but to train the LLM. For the EDPB, the concern is that output data generated by ChatGPT may be biased or "made up" (e.g., AI hallucinations and deep fakes), yet end users may mistakenly treat it as factual.
To address this risk of misinterpretation, the Report indicates that ChatGPT is held to a particularly high standard and must provide 'proper information' on the probabilistic nature of the output data and its limited level of reliability. This includes expressly informing individuals that output data may be biased or 'made up'. However, according to the EDPB, this is not sufficient to comply with the data accuracy principle under the GDPR.
(c) Fairness - responsibility for ensuring GDPR compliance should not be transferred to data subjects
The principle of fairness is another crucial aspect of GDPR compliance when it comes to the AI and data protection interplay. This is due to the potential for bias and discrimination. The Report emphasises that personal data must not be processed in a manner that is unjustifiably detrimental, unlawfully discriminatory, unexpected or misleading to individuals.
The EDPB outlines a 'crucial aspect' of compliance with this principle is the responsibility for ensuring compliance with the GDPR should not be transferred from a controller to end users. The Report essentially prohibits a controller from including a clause in the terms and conditions of use that data subjects are responsible for their chat inputs. While OpenAI has implemented measures to address this issue, the Report makes it clear that it is ultimately responsible for complying with the GDPR.
(d) Transparency – information obligations must be followed, potential exemption for indirect collection of personal data
The Report gives limited attention to the GDPR's principle of transparency. However, in the context of web-scraped data, it acknowledges that Article 14(5)(b) GDPR may be relied on by controllers subject to its requirements being met. This article is an exemption to providing transparency information (by way of a privacy notice) to individuals where personal data are indirectly collected from them, and the provision of transparency information 'proves impossible or would involve a disproportionate effort'.
The Report further provides that where personal data collected via prompts will be used to train the LLM, individuals must be informed of such processing in accordance with Article 13 of GDPR.
(e) Data Subject Rights – end users must be able to exercise their fundamental rights
The Report emphasises the importance of data subjects being able to exercise their rights under the GDPR. It acknowledges the methods by which a data subject can exercise their GDPR rights with OpenAI, but that it must continue to improve on these methods. For example, OpenAI encourages end users to exercise their right of erasure rather than rectification due to the technical challenges associated with the development and operation of its LLMs. However, the Report merely scratches the surface of this complex area of GDPR obligations.
4. Conclusion
Overall, the Report provides early indicators of the EDPB's approach and expected standards of compliance when it comes to the interplay between the GDPR and AI (namely, LLMs). As expected, full compliance with the GDPR is the regulatory approach the EDPB appears to be taking.
The overarching takeaway from the Report is that controllers such as OpenAI must comply with the GDPR and demonstrate compliance with its accountability legal framework. While there is no acknowledgement regarding the opportunities LLMs and other types of AI present to society, nor the GDPR's technology neutrality (Recital 15), expectations are high for these issues to be addressed in the EDPB's imminent guidance on the interplay between the GDPR and the AI Act.
The Report is not formal guidance but can be used as a starting framework for AI developers and deployers to comply with the GDPR and prepare to comply with the EU AI Act. The questionnaire in the Annex of the Report is of particular note in that regard.
The EDPB still needs to address many issues for businesses regarding the interplay between the GDPR, the EU AI Act, and the co-existence of these legal regimes.
Article provided by INPLP members: Leo Moore and Rachel Hayes (William Fry, Ireland)
Discover more about the INPLP and the INPLP-Members
Dr. Tobias Höllwarth (Managing Director INPLP)