Many generative AI tech vendors argue that fair use entitles them to train AI models on copyrighted material scraped from the internet, even if they don't get permission from the rightsholders. But some vendors, such as OpenAI, are playing it safe due to concerns about the ongoing infringement lawsuits.
In Discussions with Publishers for Licenses
OpenAI is actively negotiating with numerous publishers to license their articles, aiming to gather diverse content for training its artificial intelligence models. According to Tom Rubin, OpenAI's chief of intellectual property and content, the discussions are progressing well, deals have already been announced, and more are expected.
Axel Springer and Associated Press' Partnership
OpenAI has revealed a new partnership with Axel Springer, the owner of Business Insider and Politico publications, that allows OpenAI to train its generative AI models using Axel Springer's content and integrate into OpenAI's viral AI-powered chatbot, ChatGPT. This marks OpenAI's second collaboration with a news organization, following its previous announcement about licensing some of The Associated Press archives for model training.
In the future, ChatGPT users will receive summaries of "selected" articles from Axel Springer's publications, even those typically behind a paywall. These snippets will include attribution and links to the complete articles.
Axel Springer will be compensated by OpenAI with undisclosed payments at regular intervals in exchange for the content partnership. The agreement, lasting several years, does not impose exclusivity on either party. Axel Springer is committed to supporting the outlet's AI-driven initiatives leveraging OpenAI's technology. Döpfner expressed enthusiasm about the unprecedented global partnership between Axel Springer and OpenAI, highlighting their goal to explore the potential of AI-empowered journalism that aims to elevate the quality, societal relevance, and business model of journalism to new heights.
Publishers and generative AI vendors have a strained relationship, especially regarding publishers expressing concerns about potential copyright infringement and the fear of generative models diminishing website traffic. An example of this tension is Google's new generative AI-powered search experience, SGE, which can decrease traffic to traditional search results by as much as 40% by pushing those links further down the search results pages.
New York Times' Ongoing Lawsuit
Last week, The New York Times Co., a company that OpenAI had been negotiating with, filed a lawsuit against OpenAI and Microsoft Corp. for using the newspaper's articles without authorization.
The lawsuit against OpenAI by The New York Times Co. presents a significant threat to OpenAI's business. If The Times prevails, OpenAI might face substantial financial liabilities in the billions and the potentially burdensome task of erasing any training data that incorporates The Times' work. Additionally, the lawsuit immediately complicates OpenAI's ongoing negotiations and collaborations with the media industry.
Rubin emphasized that the current situation is unlike publishers' past challenges with search engines and social media. In this case, the content is employed to train a model, not to replicate or substitute the content.
The Times disagrees with OpenAI, asserting that ChatGPT is directly replicating its journalists' work without compensation. In the lawsuit, the publisher presented instances where ChatGPT generated entire paragraphs resembling The New York Times text. Some have noted that ChatGPT was explicitly prompted to reproduce Times content in a few cases. The publisher contends that this serves as evidence that OpenAI utilized New York Times data.
"If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission," The New York Times said in a statement.