After more than a year of deliberation, a data protection taskforce published its initial findings on Friday, examining how OpenAI’s ChatGPT, a highly popular chatbot, relates to the EU’s data protection regulations. The key conclusion is that the working group of privacy enforcers is still divided on important legal matters, such whether OpenAI’s processing is fair and lawful.
The matter has significance since sanctions for verified transgressions of the bloc’s privacy code may amount to 4% of worldwide yearly revenue. Watchdogs have the authority to stop processing that isn’t compliant. Thus, OpenAI is theoretically exposed to significant legal risk in the area at a time when there aren’t many laws specifically addressing AI (and even in the EU’s case, those rules are still years away from becoming fully operational).
However, it’s safe to assume that OpenAI will feel empowered to carry on with business as usual in spite of the growing number of complaints that its technology violates various provisions of the EU’s General Data Protection Regulation (GDPR), particularly in the absence of clarification from EU data protection enforcers regarding how current data protection laws apply to ChatGPT.
For instance, a complaint that the chatbot was fabricating personal information about a person and was not willing to fix led to the opening of an investigation by Poland’s data protection authority (DPA). Recently, an Austrian complaint of a similar nature was filed.
Many complaints about GDPR, little enforcement
The General Data Protection Regulation (GDPR) is said to apply to any collection and processing of personal data. Large language models (LLMs), such as OpenAI’s GPT, the AI model that powers ChatGPT, are known to collect and process personal data on a large scale when they scrape data from the public internet to train their models. This includes stealing posts from social media platforms.
The EU regulation gives DPAs the authority to halt any processing that is not in compliance. If GDPR regulators decide to use it, this might be a very potent lever for influencing how the massive AI company behind ChatGPT can operate in the area.
In fact, a preview of this was given to us last year when OpenAI was temporarily barred from processing the data of ChatGPT users in Italy by the country’s privacy authorities. The AI behemoth temporarily suspended the service in the nation as a result of the action, which was carried out under the GDPR’s emergency powers.
It wasn’t until OpenAI modified the data and controls it gave users in response to a set of requests from the DPA that ChatGPT was able to restart in Italy. However, the Italian inquiry into the chatbot is still ongoing. It covers important topics such as the legal justification that OpenAI cites for using user data to train its AI models in the first place. In the EU, the tool is still shrouded in legal uncertainty.
Any organization that want to process personal data about individuals must have a legitimate reason for doing so under the GDPR. Six potential bases are listed in the legislation; however, the majority are unavailable in the context of OpenAI. There are only two possible legal bases left for the AI giant to use: either consent (i.e., asking users for permission to use their data) or a broad basis called legitimate interests (LI), which requires a balancing test and requires the controller to allow users to object to the processing. Additionally, the Italian DPA has already instructed the AI giant that it cannot rely on claiming a contractual necessity to process people’s data to train its AIs.
OpenAI seems to have shifted to asserting that it has a LI for processing personal data used for model training after Italy got involved. On the other hand, OpenAI was found to have broken the GDPR in the DPA’s draft decision from its inquiry in January. We have not yet seen the authority’s complete evaluation of the legal basis point, notwithstanding the fact that no specifics of the draft findings were made public. Regarding the complaint, a final ruling is still pending.
A precise “fix” to ensure ChatGPT is legal?
This complex lawfulness issue is covered in the taskforce report, which highlights the need for ChatGPT to have a legitimate legal foundation for all phases of processing personal data, including gathering training data, pre-processing it (filtering, for example), training, prompts and ChatGPT outputs, and any training related to ChatGPT prompts.
The taskforce describes the first three of the listed steps as posing “peculiar risks” to people’s fundamental rights. The report emphasizes how the scope and automation of web scraping can result in the ingestion of substantial amounts of personal data that touch on many facets of peoples’ life. Additionally, it states that data that has been scraped may contain the most sensitive personal data—known as “special category data” under the GDPR—such as sexual orientation, political opinions, and health information. These sorts of data are subject to much stricter legal requirements for processing than regular personal data.
The taskforce further states that when it comes to special category data, the mere fact that it is publicly available does not imply that it has been made “manifestly” public, which would result in an exemption from the GDPR’s need that specific consent be obtained before processing this kind of data. It states that “it is important to ascertain whether the data subject had intended, explicitly and by a clear affirmative action, to make the personal data in question accessible to the general public in order to rely on the exception laid down in Article 9(2)(e) GDPR.”
OpenAI must demonstrate that it needs to process the data, that the processing is limited to what is necessary for this need, and that it has completed a balancing test that weighs its legitimate interests in the processing against the rights and freedoms of the data subjects (i.e., the people the data is about) in order to rely on LI as its legal basis in general.
The taskforce offers yet another recommendation in this regard, stating that “adequate safeguards” could, in its words, “change the balancing test in favor of the controller” by allowing for the collection of less data in the first place and reducing impacts on individuals. These safeguards could include “technical measures,” “precise collection criteria,” and/or blocking out specific data categories or sources (like social media profiles).
This strategy may compel AI businesses to exercise greater caution in the types and amounts of data they gather in order to reduce privacy threats.
The taskforce also recommends that “mechanisms should be in place to delete or anonymize personal data that has been collected via web scraping before the training stage.”
Additionally, OpenAI wants to employ LI to handle the prompt data from ChatGPT users in order to train models. The paper highlights that in order for such content to be used for training purposes, users must be “clearly and demonstrably informed” and mentions that this is one of the considerations that would be taken into account in the balancing test for LI.
Whether the AI behemoth has complied with the necessary conditions to be genuinely able to rely on LI will be determined by the individual DPAs reviewing the complaints. If not, the creator of ChatGPT would have only one legitimate choice in the EU: requesting permission from the populace. Furthermore, it’s not obvious how feasible it would be given how much personal data is probably included in training data-sets. (Agreements that the AI behemoth is quickly cutting with news outlets to license their reporting, however, wouldn’t serve as a model for licensing personal data in Europe because consent is legally required to be freely granted.)
Transparency & fairness are essential.
The taskforce’s report emphasizes that the user cannot bear the risk of privacy if a phrase stating that “data subjects are responsible for their chat inputs” is incorporated into a terms and conditions document. This is in line with the GDPR’s fairness principle.
It continues, “OpenAI should not argue that the input of certain personal data was prohibited in the first place and remains responsible for complying with the GDPR.”
Given the scope of the web scraping involved in obtaining data-sets to train LLMs, the taskforce appears to acknowledge that OpenAI could invoke an exemption (GDPR Article 14(5)(b)) to tell persons about data gathered about them with regard to transparency duties. However, the report emphasizes how important it is to let people know that their input may be utilized for training.
The report also discusses ChatGPT’s tendency to fabricate information, or “hallucinate.” It warns that the GDPR’s “principle of data accuracy must be complied with” and highlights the necessity for OpenAI to give “proper information” regarding the chatbot’s “limited level of reliability” and “probabilistic output.”
Additionally, the taskforce recommends that OpenAI informs users in a “explicit reference” that the language it generates “may be biased or made up.”
The report characterizes the accessibility of data subject rights, namely the right to rectification of personal data, as “imperative.” This right has been the topic of several GDPR complaints regarding ChatGPT. It also notes drawbacks in OpenAI’s existing methodology, such as the fact that consumers are simply given the option to block the generation of erroneous personal information about them rather than having it repaired.
The taskforce, however, just makes the general recommendation that the business implement “appropriate measures designed to implement data protection principles in an effective manner” and “necessary safeguards” to meet the requirements of the GDPR and protect the rights of data subjects. It does not provide specific guidance on how OpenAI can improve the “modalities” it offers users to exercise their data rights. It is rather similar to saying, “We don’t know how to fix this either.”
ChatGPT Is the GDPR being enforced?
The goal of the ChatGPT taskforce was to expedite the implementation of the bloc’s privacy regulations around the emerging technology. It was established in April 2023, following Italy’s widely publicized involvement on OpenAI. The European Data Protection Board (EDPB), a regulatory organization that oversees the implementation of EU law in this domain, is home to the taskforce. Nonetheless, it’s crucial to remember that DPAs are still independent and capable of upholding the law in areas where GDPR enforcement is decentralized.
Although DPAs have the unwavering ability to be enforced locally, watchdogs appear to be apprehensive about how to react to a relatively new technology such as ChatGPT.
The Italian DPA made a point of stating that its procedure would “take into account” the work of the EDPB team when it released its draft decision earlier this year. Additionally, there are other indications that watchdogs might be more likely to hold off on taking enforcement action until the working group has had a chance to review and produce a final report, which might be another year from now. Thus, the taskforce’s very presence might already be having an impact on OpenAI’s chatbot’s compliance with GDPR by slowing down complaint investigations and decision-making.
For instance, Poland’s data protection authority indicated in a recent interview with local media that it would have to postpone any inquiry into OpenAI until the commission had finished its work.
When we questioned the watchdog about whether the ChatGPT taskforce’s parallel workstream is causing delays in enforcement, they did not answer. The taskforce’s findings, however, “does not prejudge the analysis that will be made by each DPA in their respective, ongoing investigations,” an EDPB spokeswoman informed us. “DPAs are competent to enforce, but the EDPB has an important role to play in promoting cooperation amongst DPAs on enforcement,” they stated, however.
As things stand, DPAs appear to have a wide range of opinions regarding how quickly they should address issues regarding ChatGPT. As a result, while Italy’s watchdog garnered attention for its prompt actions last year, Helen Dixon, the former commissioner for data protection in Ireland, stated at a Bloomberg conference in 2023 that DPAs shouldn’t rush to outlaw ChatGPT, arguing instead that they should take their time figuring out “how to regulate it properly.”
The decision by OpenAI to establish an EU operation in Ireland last autumn was probably not a coincidence. The move was discreetly followed in December by a modification to its terms and conditions that named its new Irish entity, OpenAI Ireland Limited, as the regional supplier of services like ChatGPT. This created a framework that allowed the AI behemoth to apply to the Data Protection Commission (DPC) of Ireland to be its primary supervisor for GDPR supervision.
According to the EDPB ChatGPT taskforce report, OpenAI appears to have benefited from this regulatory-risk-focused legal restructuring, as the company was granted main establishment status as of February 15 of this year. This allows it to take advantage of the One-Stop Shop (OSS) mechanism in the GDPR, which means that any cross-border complaints arising since then will be routed through a lead DPA in the country of main establishment (i.e., Ireland in OpenAI’s case).
Even while this may all sound very complicated, it essentially implies that the AI company may avoid the possibility of more decentralized GDPR enforcement, as we have seen in Poland and Italy, as Ireland’s DPC will be the one to decide which concerns are probed further, how, and when.
The Irish watchdog is known for enforcing the GDPR on Big Tech in a manner that is business-friendly. Put another way, “Big AI” might be the next to gain from Dublin’s generosity in interpreting the EU’s data protection regulations.
When this story went to press, OpenAI had not yet responded to a request for comment regarding the preliminary report of the EDPB taskforce.