Law

Is it 'fair use' or 'copyright infringement'? Encyclopedia Britannica sues OpenAI or reshapes industry rules

2026-03-23   

Encyclopedia Britannica and its subsidiary Merriam Webster have recently sued OpenAI in Manhattan Federal Court, accusing the artificial intelligence (AI) giant of misusing their reference materials to train AI models. The core controversy of this lawsuit lies in whether OpenAI's unauthorized use of nearly 100000 articles from Encyclopedia Britannica to train AI is a "fair use" that should be allowed or a "copyright infringement" that must be prohibited. Industry insiders point out that this case is far more than just an AI copyright dispute. From the "input end" of training data to the "output end" of generating content, from traditional copyright infringement to emerging trademark and attribution disputes, Encyclopedia Britannica is launching a counterattack against the "source order" of the AI era. The first dispute in this case regarding the boundary of "reasonable use" between the two parties focuses on whether the data acquisition behavior during the AI model training stage (data input) constitutes copyright infringement. According to Reuters, in a lawsuit filed on March 13th, Encyclopedia Britannica claimed that OpenAI used nearly 100000 online articles, encyclopedias, and dictionary entries from Encyclopedia Britannica to train chatbot ChatGPT on how to respond to user inquiries, and that this copying behavior was "systematic and scalable". The lawsuit describes ChatGPT as a free rider on reliable high-quality content from Encyclopedia Britannica, transferring the value of the latter's content to OpenAI without paying any compensation. On March 16th, a spokesperson for OpenAI stated in response to the lawsuit: "Our AI model is designed to drive innovation, and its training is based on publicly available data and complies with the principle of 'fair use'. ”This is the standard defense framework for the AI industry in combating copyright litigation - they believe that converting copyrighted content into training data constitutes "transformative use" and should not be subject to copyright restrictions. However, the uniqueness of this case lies in the nature of the content used by AI in the Encyclopedia Britannica. Unlike ordinary web pages or news information, encyclopedia entries and dictionary definitions undergo a strict process of compilation, review, and updating, possessing a high degree of originality and authority, and are themselves copyright products with stable commercial value. When AI model absorbs such "high-quality structured knowledge system" rather than scattered Internet information, the boundary of "transformational use" will be re examined. It is worth noting that Encyclopedia Britannica is not passively waiting for infringement to occur. The lawsuit disclosed that the company had proactively contacted OpenAI in November 2024 to explore the possibility of licensing cooperation, but OpenAI "never seriously considered licensing" despite having reached licensing agreements with other similar publishers. In fact, some current academic research has shown that in the era of generative artificial intelligence, data is no longer limited to static content, but permeates every stage of the AI lifecycle - from training samples that shape model parameters to prompts and output results that drive actual deployment. This means that traditional "input side" compliance controls may no longer be able to cover the entire process of data continuously playing a role within the model. The lawsuit initiated by Encyclopedia Britannica precisely touches on this core contradiction: even if it is acknowledged that the replication during the training phase is "transformative," how can the control rights of the rights holders be protected when these contents are continuously reused through model output? If the controversy surrounding the training phase can still be debated within the framework of "transformative use," then the accusations related to the "output phase" raised by Encyclopedia Britannica directly push OpenAI into the traditional forbidden zone of copyright infringement - replication. Encyclopedia Britannica attached detailed comparative evidence in its lawsuit on the 13th, accusing ChatGPT of generating content that is "verbatim identical or highly similar" to the original work in response to user requests. The lawsuit clearly states that "ChatGPT copied the expression, meaning, and information of the plaintiff's copyrighted content and repackaged it for consumers. ChatGPT did not add any new expressions, meanings, or information. ”OpenAI also generated relevant content summaries through artificial intelligence, "eating away" the network traffic of Encyclopedia Britannica. This is precisely the most controversial and core issue in current AI copyright cases - whether the model is "abstractly learning" or "memorizing" the original text under specific conditions? When the user inputs' Please provide me with an article on education from the Encyclopedia Britannica ', the output of ChatGPT is almost identical to the original version. In this case, AI is no longer passively "learning" knowledge, but actively "providing" copyrighted original text. From the perspective of data protection, this phenomenon reveals a deep dilemma: once data is included in model training, its form of existence undergoes a fundamental transformation, from independent and identifiable works to distributed and difficult to track parameters and weights. The Encyclopedia Britannica points out that although it can be confirmed that OpenAI used nearly 100000 articles, "the true scope of replication is only known by OpenAI itself. This information asymmetry puts the rights holder in a natural disadvantaged position when asserting their rights. Media analysis suggests that the most distinctive innovation in the lawsuit filed by Encyclopedia Britannica against OpenAI lies not in copyright, but in trademark and source labeling. The lawsuit alleges that OpenAI not only implied that it was authorized to copy the content of the Encyclopedia Britannica, but also improperly cited the Encyclopedia Britannica in the "illusion" information generated by AI, attributing factual errors to this authoritative knowledge institution with over 250 years of history. This raises a deeper issue beyond copyright law: in the era of AI, how should "source credibility" and "brand attribution authority" be protected? For knowledge brands such as encyclopedias and dictionaries, while content is important, what is truly scarce is a trusted source identity that has been recognized by society for a long time. If AI generates incorrect content but puts the name of Encyclopedia Britannica on it, it will not only damage the click through rate of a certain item, but also the knowledge authority represented by the brand. European and American media pointed out that this lawsuit is one of the many rights protection actions launched by copyright policies against technology companies for training AI systems without permission. Last year, the Encyclopedia Britannica filed a similar copyright lawsuit against the artificial intelligence startup Perplexity AI, and the case is still under trial. Industry insiders point out that this case is a key counterattack by traditional knowledge institutions against the "source order" of the AI era, following the lawsuit against Perplexity AI by Encyclopedia Britannica. Although OpenAI insists on the "fair use" defense, this case is likely to be merged into the Multi District Litigation (MDL) in the Southern District Court of New York and tried together with cases such as The New York Times. The final verdict may reshape the game rules of the entire AI industry. It is reported that Encyclopedia Britannica has requested in its lawsuit that the court order a ban on OpenAI's infringement behavior and demand an unknown amount of compensation from OpenAI. At present, regardless of the future verdict of this case, a basic consensus is forming: the development of AI cannot be achieved at the cost of dissolving the "source order", and data protection also needs to adapt to the technological characteristics of the AI era. (New Society)

Edit:Yiyi Responsible editor:Jiajia

Source:legaldaily

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Recommended Reading Change it

Links