AI chatbots are illegally ripping off copyrighted news, says media group

Published by Victoria Kyle at November 1, 2023

AI developers are taking revenue, data and users away from news publications by building competing products, the News Media Alliance claims.

The News Media Alliance (NMA), a news industry group, has claimed that artificial intelligence (AI) developers often rely on illegally scraping copyrighted material from news publications and journalists to train their models. In a 77-page white paper and accompanying submission to the United States Copyright Office, the NMA alleged that the datasets used to train AI models include a significant amount of content from news publishers.

The NMA argued that AI generations “copy and use publisher content in their outputs,” which infringes upon their copyright and puts news outlets in competition with AI models. The statement from the NMA noted that while news publishers invest resources and take on risks, it is the AI developers who reap the rewards in terms of users, data, brand creation, and advertising dollars. This situation also leads to reduced revenues, fewer employment opportunities, and strained relationships between publishers and their audience.

To address these issues, the NMA has recommended that the Copyright Office declare that using a publication’s content to monetize AI systems is harmful to publishers. The group has also called for the adoption of various licensing models and transparency measures to restrict the use of copyrighted materials. Additionally, the NMA has suggested that the Copyright Office take measures to prevent protected content from being scraped from third-party websites.

While the NMA acknowledged the benefits of generative AI and noted that publications and journalists can use AI for tasks such as proofreading, idea generation, and search engine optimization, it emphasized the need for fair compensation and respect for copyright.

The use of AI chatbots, such as OpenAI’s ChatGPT, Google’s Bard, and Anthropic’s Claude, has increased over the last 12 months. However, the methods used to train these AI models have faced criticism, with all three facing copyright infringement claims in court. In July, comedian Sarah Silverman sued OpenAI and Meta, alleging that they used her copyrighted work to train their AI systems without permission.

OpenAI and Google have also faced separate class-action lawsuits over claims that they scraped private user information from the internet. Google has stated that it will assume legal responsibility if its customers are alleged to have infringed upon copyright by using its generative AI products on Google Cloud and Workspace. However, this legal protection promise does not extend to Google’s Bard search tool. At the time of writing, OpenAI and Google had not responded to requests for comment.