Journalist Files Lawsuit Against AI Firms Over Use of Copyrighted Books for Chatbot Training

Journalist Files Lawsuit Against AI Firms Over Use of Copyrighted Books for Chatbot Training

A group of US reporters, including New York Times journalist John Carreyrou, has filed a lawsuit against several prominent artificial intelligence companies, including OpenAI, Google, Elon Musk’s xAI, Anthropic, Meta Platforms, and Perplexity. The suit alleges that these companies used copyrighted books without permission to train their AI systems.

Carreyrou and five other authors filed the lawsuit in federal court in California, accusing the AI firms of pirating their protected works to develop large language models that power AI chatbots. The companies have not obtained licenses or compensated the authors.

The filing states that this case involves a clear act of theft constituting copyright infringement. This lawsuit is part of a broader trend of copyright cases against tech firms, though it is notably the first to include xAI as a defendant.

The petitioners claim that the AI companies accessed pirated copies of books through shadow libraries such as LibGen, Z-Library, and OceanofPDF, allegedly integrating these copies into their systems to expedite development. The lawsuit asserts that this infringement has affected hundreds of authors, including bestselling writers and Pulitzer Prize winners.

Unlike other cases, the plaintiffs are not pursuing a class-action lawsuit, which would allow the defendants to negotiate a single settlement. Instead, they want individual claims evaluated by a jury. The complaint argues that existing class-action settlements do not adequately account for the scale of alleged infringement.

In August, Anthropic settled the first major AI-training copyright case for $1.5 billion, with claims that the company pirated millions of books. In that instance, class members were slated to receive only 2% of the potential damages under the Copyright Act.

AI firms contend that using copyrighted material in this manner qualifies as fair use, as their systems generate new and transformative outputs rather than directly reproducing original works. A previous ruling found that while Anthropic’s use of copyrighted books for AI training could be considered fair use, the company did violate copyright law by storing millions of pirated books in a central database, regardless of their training use.

Source link