Generative Artificial Intelligence (GenAI) can generate and manipulate our ideas and thinking by creating human-like content via non-human intelligence.[1] These software(s) such as OpenAI’s ChatGPT/ GPT-4, Google’s BARD, inter alia, are initially trained on a large data sets and computing power. After the training, they are capable of self-enhancement to generate unique and personalised content.[2] This has posed novel questions before the copyright experts, as content generation, previously reliant on human inputs, has moved beyond that realm. Now, instead of answers based on user queries – as obtained via Google’s search engine – customized personal content is delivered to the user. Creation of this new content through GenAI has led to concerns on copyright infringement, privacy violation, libel and defamation, etc. Copyright infringement is particularly worrisome as the companies are using the user-generated data to train these software(s), which includes the data generated by minors, amplifying their vulnerability. Questions arise regarding the extent to which the companies can claim ‘fair-use’ exception of the Copyright Act? This article attempts to bring some clarity over these issues. It incorporates two landmark US cases against OpenAI’s ChatGPT and Alphabet Inc., respectively[3], and their implications in India, including the India’s recently-passed Digital Personal Data Protection Act, 2023.

Gen-AI tools and the Copyright violation of an Unprecedented Scale

Data is the new oil of the 21st century. We all generate data in our daily lives, and their misuse by AI developers have often been criticised. Firstly, the AI developers have been criticized of unauthorised “web scarping” i.e. taking users’ data without their consent.[4] It is alleged that OpenAI scarped over 300 billion words from the internet, including “articles, websites, books, posts, including personal information obtained without users’ consent”[5] to train their software. In USA, the courts have recognized data as a ‘property’, therefore, such scarping raises allegations of data misappropriation and theft.[6]

Secondly, even if we upload a picture or any other data over the internet for public access such as on our blog or social media profile, we still maintain a reasonable expectation of ensuring our data to be safe and secure. In KS Puttaswamy v. Union of India[7], the Supreme Court recognised right to privacy as a facet of Article 21 of the Constitution. If our data is used for training AI software without our specific consent then such negligent act results in the breach of trust, and it is an intrusion upon our privacy.[8]

Thirdly, the data generated by children and their privacy concerns stand at a separate pedestal. They stand at higher risk of abuse, discrimination, and exploitation. Unfortunately, AI lacks sufficient tools to prevent children from accessing their portal. In USA, the Children’s Online Privacy Protection Act (COPPA) forbids monitoring, collecting data or using information from children unless valid parents’/guardian consent is obtained. Unfortunately, such provisions are missing in India, with self-reporting as the only check available, which is not effective.[9]

Fourthly, AI software(s) have also produced false or inaccurate paragraphs when they are asked to cite a specific author’s work, such as a poem. Such false result causes misinformation to the public, and it also hampers the moral rights of the authors by distorting their works.[10] The subsequent section shall answer how the Courts are addressing these concerns, thereby, giving a direction to the future course of copyright jurisprudence.

Judicial Intelligence to address the conundrum of Generative Artificial Intelligence

“Machine learning is no excuse to break the law…the data used to improve algorithms must be lawfully collected and lawfully retained”[11]

OpenAI claims that the data fed for AI training and execution is covered under the fair-use exception.[12] For instance, Section 52 of India’s Copyright Act, 1957 lays down certain fair usage exceptions such as “reading or recitation in public of reasonable extracts from a published literary or dramatic work” when extended to digital context[13]; or “the reproduction, for the purpose of research”[14], inter alia. Similarly, the US Jurisprudence determines fair usage based upon the following four factors[15]:  

  1. Firstly, the purpose and character of the usage, whether it is commercial or educational. In Campbell v. Acuff Rose Music, the Court held that the protection would strengthen if the work is “transformative” i.e. if it supersedes the original creation, or adds something new, rather than merely being “expressive”.[16]
  2. Secondly, the substantiality of the portion used in relation to the copyrighted work as a whole. For instance, OpenAI has claimed that the training material is not made public, rather it is the new content developed upon such material that is made available. Hence, they claim that the program is transformative.
  3. Thirdly, the effect of the use upon the relevant market, or the value of the product within the relevant market. For instance, OpenAI claims that since the data set is consumed by the machines without presenting it to the humans, the authors would not lose any potential audience. However, New York Times, inter alia, is considering a legal action against OpenAI as the AI tool has “greatly diminished the need to visit the publisher’s website”.[17]
  4. Fourthly, the determination involves consideration of the nature of the copyrighted work as well[18]. For instance, in Andy Warhol v. Goldsmith, the court held that if the secondary work shares the same or highly similar purpose, and if the secondary work is used for commercial purpose, then the justification of fair-use becomes especially difficult unless compelling reasons exist.[19]

Now, once the user’s query has generated an output, the US Courts then rely upon a two-pronged test to determine any copyright infringement in the result. The burden is upon the plaintiff to prove that, firstly, the software had “access to their works” to indicate the actual copying of the underlying work; and secondly, the software created a “substantially” similar output. Here, the second aspect is difficult to determine as it involves the consideration of various factors such as “similar concept and feel” or “overall look and feel” or the inability of an ordinary/ reasonable person to differentiate between the two works.[20] This is a subjective determination for each output produced by the AI software in response to the user’s query.

This suggests that both the AI user (whose search query generated the infringing content) and the AI company (which provided such platform) could be potentially liable for the copyright infringement. OpenAI argues that their AI systems “generally do not regenerate unaltered data from any particular work in their training corpus”[21]. Yet, despite OpenAI’s best claim, they may still be liable under “vicarious liability” if they failed to prevent an infringement.[22]

Moving on, the next section deals with the issues concerning the authorship of AI-generated texts, and it brings out clarity regarding the developing Indian jurisprudence. 

Indian Jurisprudence on GenAI – Past, Present and the Future

The US Copyright Office only recognises the work “created by humans” as eligible for protection.[23] The initial ownership goes to the creator of the work, and non-human authors have been denied copyright protection.[24] Broadly, a similar position ensues in India. Section 2(d) of the Copyright Act, 1957 defines an “author” in relation to computer generated work as any “person” who causes the work to be generated. Section 13 requires that such work must be “original”, however, the word “original” is not defined in the statute. The Indian Copyright Office has been uncertain about extending the protection to AI generated content. It has earlier issued a withdrawal notice where AI was a co-author with a natural person.[25]

The 161st Parliamentary Standing Committee Report found that the Copyright Act, 1957 is “not well equipped to facilitate authorship and ownership by Artificial Intelligence”.[26] For instance, Section 16 of the Act specifically provides that “no person” shall be entitled to the copyright protection, except in accordance to the provisions of this Act. However, the Report suggested that ‘patent protection’ should be extended to AI-generated works to “incentivize innovation and R&D”.[27] We suggest that this reasoning should be further extended to copyright protection for AI-generated content to encourage creativity and to enhance AI-based expressions.

Furthermore, Section 12 of India’s recent Digital Personal Data Protection Act, 2023 dated 11th August 2023, lays down the “right to be forgotten”. It mandates the removal of the user’s personal data upon his/her request. This can act as an easy mechanism for removal of the copyright infringing material. However, practical concerns arise as once the AI-software is trained on a data set, it cannot “unlearn” itself.[28] For instance, ChatGPT’s opt-out function from data collection would only turn off the chat history. It means old data continues to be a part of the training, and only the new content is not used for “training purposes”. It does not provide any method to remove the previously fed data.[29] This is especially concerning as AI-tools are now being integrated on several platforms such as web-browsers, etc., enabling real time information upon each click.

The Indian Courts have been proactive in restraining the misuse of AI tools for copyright infringement. In Anil Kapoor v. Simply Life India[30], the Court issued injunction against the use of Artificial Intelligence to create fake, morphed content, especially for commercial purposes. It aimed to protect the personality rights of the individual. In Mareta v. Google Inc[31], the US District Court held that remedial measures such as to prevent copyright infringement and privacy protection must be construed broadly to include “new technologies”. This underlying idea for rapid evolution has been accepted by the Indian Courts as well. However, as the 161st Parliamentary Report suggested “there is a need to review the provisions of [the Copyright Act] on a priority basis”.[32]

Conclusion and Recommendation

AI-generated content is a reality, and it is penetrating deeper into our lives with each passing moment. This creates an urgent need to bring up comprehensive discussions on the elephant in the room. We reiterate the suggestions of the 161st Parliamentary Report to “create a separate category of rights for AI and AI related works”.[33] Furthermore, discussion must be initiated on the feasibility of the following suggestions:  firstly, to establish an independent AI Council for approval/ rejection of AI-platforms. Such body would be a ‘“Certification Board” for AI platforms, evaluating them on pre-defined standards. Secondly, to create and implement accountability platforms. AI Platforms such as GPT-4 are monetised, and any copyright infringement must involve monetary compensation for violation of rights, while adhering to our ethical principles. Thirdly, there is a need for effective transparency and cybersecurity protocols involving disclosure of data, and ensuring the right to erasure, in light of Section 12 of the Digital Personal Data Protection Act, 2023.  Fourthly, to set up an AI Fund for immediate redressal of any future infringement, and to ensure penal provision to prevent the misuse of children’s data, and to protect the Right to Privacy granted under Article 21.[34]

