Newspapers nationwide sue OpenAI for illegally scraping content to train AI


Eight newspapers across the United States filed a lawsuit against OpenAI and Microsoft for using their copyrighted content without payment to train the highly profitable generative artificial intelligence systems. In their suit, the papers demand an end to copyright infringement, damages, and lost profits.

“The Publishers have spent billions of dollars sending real people to real places to report on real events in the real world and distribute that reporting in their print newspapers and on their digital platforms,” the newspapers said in their complaint. “Yet Defendants are taking the Publishers’ work with impunity and are using the Publishers’ journalism to create GenAI products that undermine the Publishers’ core business by retransmitting “their content” — in some cases verbatim from the Publishers’ paywalled websites—to their readers.”

“As if plagiarizing the Publishers’ were not enough, Defendants’ products are often subject to “hallucinations” where those products malign the Publishers credibility by falsely attributing inaccurate reporting to the Publishers’ newspapers,” continued the complaint, which was filed by eight papers — including the New York Daily News and Chicago Tribune — owned by Alden Global Capital.

One such hallucination in the complaint was GPT allegedly falsely stating “that the Denver Post published research and medical observations that smoking can be a cure for asthma.”

“OpenAI, in a submission to the United States Patent and Trademark Office, claims training large language models on copyrighted materials constitutes “fair use” under current law. “

In December, the New York Times also filed suit against OpenAI and Microsoft, also over hallucinations, revenue losses, and illegal use of its content for AI training.

These lawsuits come as news organizations across the world face declining readership, circulation, and revenues. While digital subscriptions are growing for some, overall revenue is still declining as advertising revenue losses outpace increases in subscription revenue.

Some news organizations, such as Axel Springer, have even opted to enter agreements with AI firms for use of their content.

California lawmakers are advancing a proposal that social media and internet search companies pay news organizations a portion of the advertising revenue for advertising impressions when news organizations’ content appears in search results or social media feeds. In response to the legislation, Google stopped showing content from some California news organizations to some California users.

Australia adopted such a law in 2021, but Meta announced earlier this year it would not be renewing any of its agreements to pay news publishers, and would thus instead prefer to pay potential fines for noncompliance.

Newsrooms have already adopted generative AI to increase output, and Google’s Genesis AI, a product under development that takes in facts and spits out articles, is aimed at journalists.

The Center Square attempted to replicate the exact examples from the lawsuit but was unable to do so, even after switching to news sources not engaged in any ongoing lawsuits, suggesting ChatGPT has since been updated to avoid similar cases.