AI Is Using Pirated Books To Train Large Language Models
- Dan Lalonde
- Mar 24
- 2 min read

A bombshell revelation has exposed one of the most pressing ethical dilemmas facing the AI industry today: the use of pirated books to train large language models (LLMs) like Meta’s Llama 3 and OpenAI’s ChatGPT. At the center of the storm is Library Genesis (LibGen), a massive underground digital library that hosts over 7.5 million books and 81 million research papers—most without any legal permission.
Court documents from an ongoing lawsuit have revealed that Meta employees, in their race to build a competitive AI, discussed bypassing expensive licensing deals in favor of downloading from LibGen. Internal chats show that executives were aware of the legal risks, with some even recommending masking the pirated nature of the files. Yet, despite these concerns, the dataset was reportedly used after receiving informal approval from the top—possibly from Mark Zuckerberg himself.
OpenAI, also implicated, claims the models currently powering ChatGPT were not trained on these datasets, asserting that earlier versions—built by now-departed employees—were the only ones that may have accessed LibGen.
At the heart of the debate is the argument of “fair use.” Tech companies assert that transforming copyrighted texts into training material constitutes a legal, transformative use. Critics, including authors like Sarah Silverman and Junot Díaz who are suing Meta, argue that this undermines creative labor and intellectual property laws.
The scandal also underscores a wider issue: the academic publishing industry’s paywalls may be driving desperate researchers—and AI developers—toward pirated sources. While platforms like LibGen make knowledge more accessible, they also exploit the very people who produce it.
As AI becomes more ingrained in daily life, the question looms: can innovation flourish without eroding the rights of creators?
Visit Dan Lalonde Films For All Technology And Entertainment News
Source: The Atlantic
Photo Credit: AI
Comentarios