STOCKS
Loading stock data...
AI NEWS

Theranos Investigator John Carreyrou Sues Major AI Companies Over Book Copyright

NYT Reporter Sues Google, xAI, OpenAI Over AI Training

John Carreyrou, the investigative journalist who exposed the Theranos scandal, is taking on some of the world’s most powerful technology companies. On Monday, Carreyrou and five other authors filed a lawsuit in California federal court accusing Elon Musk’s xAI, Anthropic, Google, OpenAI, Meta Platforms, and Perplexity of pirating their copyrighted books to train artificial intelligence chatbots without authorization.

The legal action marks a significant escalation in the ongoing battle between content creators and AI developers over how copyrighted material can be used. Unlike previous similar cases, this lawsuit deliberately avoids the class action format, arguing that bundling thousands of claims together allows tech companies to settle disputes at unfairly low rates.

The Core Allegation: Unauthorized Book Copying

According to the complaint filed in California federal court, the AI companies fed copyrighted books into the large language models that power their chatbot systems without obtaining permission from authors or publishers. Carreyrou, best known for his book “Bad Blood” chronicling the rise and fall of Theranos, joins five unnamed co-plaintiffs in making these claims.

The lawsuit alleges that these companies systematically copied protected literary works to build training datasets for their AI systems. Large language models require enormous amounts of text to learn language patterns, and books represent high-quality, well-edited content that’s particularly valuable for training purposes. However, using copyrighted material without authorization or compensation raises serious legal questions under U.S. copyright law.

A Perplexity spokesperson told Reuters that the company “doesn’t index books,” though the statement didn’t directly address whether books were used in training. Representatives from the other named defendants, including Google, OpenAI, Meta, Anthropic, and xAI, did not immediately respond to requests for comment.

Why This Lawsuit Takes a Different Approach

What sets this case apart from other copyright disputes targeting AI companies is its rejection of the class action mechanism. In a class action, many plaintiffs with similar claims band together under common representation, ultimately negotiating a single settlement that’s divided among all participants.

The authors argue this approach systematically disadvantages copyright holders. Their complaint states that large language model companies shouldn’t be permitted to “extinguish thousands upon thousands of high-value claims at bargain-basement rates” through class action settlements.

This criticism isn’t merely theoretical. The lawsuit points to Anthropic’s recent settlement as evidence of the problem. In August, Anthropic reached what was described as the first major settlement in an AI training copyright dispute, agreeing to pay $1.5 billion to a class of authors who claimed the company pirated millions of books.

That figure sounds substantial, but the new lawsuit highlights the mathematics behind class action distributions. According to the complaint, individual class members in the Anthropic case will receive just 2% of the Copyright Act’s statutory ceiling of $150,000 per infringed work. For authors whose books were used without permission, that translates to $3,000 per title rather than the potential $150,000 they might recover in individual litigation.

The xAI Factor

This case represents the first time Elon Musk’s xAI has been named as a defendant in a copyright lawsuit related to AI training. The company, which launched its Grok chatbot to compete with ChatGPT and other AI assistants, has maintained a lower profile than some competitors in copyright disputes until now.

xAI’s inclusion alongside more established AI developers suggests authors and their legal representatives view the company as engaging in similar training practices. As newer entrants to the AI space build competing systems, they face the same fundamental challenge: acquiring enough high-quality text data to train effective language models.

Legal Representation and Past Connections

The lawsuit was filed by attorneys at Freedman Normand Friedland, including Kyle Roche. Interestingly, Carreyrou himself wrote a 2023 New York Times profile of Roche, creating an unusual connection between journalist and attorney in this case.

That relationship came under scrutiny during a November hearing in the Anthropic class action. U.S. District Judge William Alsup criticized a separate law firm Roche co-founded for encouraging authors to opt out of the settlement in pursuit of better individual deals. Roche declined to comment when asked about the judge’s criticism.

This dynamic illustrates the strategic considerations authors face when deciding how to pursue copyright claims against AI companies. Some believe class actions provide strength in numbers and guaranteed compensation, even if amounts are modest. Others, like the plaintiffs in this new lawsuit, prefer to maintain individual claims with the potential for larger recoveries.

The Broader Copyright Battle

This lawsuit is part of a growing wave of copyright litigation targeting AI companies. Authors, visual artists, news organizations, and other content creators have filed numerous cases arguing that AI systems were trained on copyrighted material without authorization or compensation.

Tech companies generally argue that using copyrighted material for AI training constitutes “fair use” under copyright law, a doctrine that permits limited use of protected works for purposes like commentary, criticism, and transformative uses. They contend that training AI systems transforms copyrighted text into something fundamentally different, making it permissible without a license.

Copyright holders reject this interpretation. They argue that AI companies are essentially building multibillion-dollar businesses on the backs of creators’ work without providing any compensation. The legal question of whether AI training qualifies as fair use remains largely unresolved, with few definitive court rulings on the issue.

What Happens Next

The case will proceed through federal court in California, where judges have been grappling with similar AI copyright questions in multiple pending lawsuits. The defendants will likely file motions to dismiss, arguing either that their use of copyrighted books constitutes fair use or that the plaintiffs cannot prove their specific works were used in training.

Discovery, if the case reaches that stage, could prove particularly revealing. Authors would have the opportunity to demand internal documents showing how AI companies acquired training data and which specific books appeared in their datasets. Such information could strengthen claims while also providing public insight into practices that companies have largely kept confidential.

For the AI industry, the outcome matters enormously. If courts consistently rule against fair use arguments in training contexts, companies may need to negotiate licensing agreements with publishers and authors, potentially adding significant costs to AI development. Alternatively, they might need to rely on public domain works or content created specifically for AI training, which could limit model capabilities.

The lawsuit underscores a fundamental tension in the AI era: how to balance technological innovation with respect for intellectual property rights. As Carreyrou and his fellow authors see it, AI companies shouldn’t get to build their businesses by taking others’ work without permission, regardless of how transformative the resulting technology might be.

Stay Updated

Get the latest news delivered to your inbox.

We respect your privacy. Unsubscribe at any time.