Starting last year, various rightsholders began filing lawsuits against companies that develop AI models.
The list of complainants includes record labels, book authors, visual artists, and newspapers, including the New York Times. These rightsholders all object to the presumed use of their work to train AI models without proper compensation.
The New York Times lawsuit targets OpenAI and Microsoft and is steadily moving forward. OpenAI recently indicated that it would like to consolidate this case with a similar lawsuit filed by other newspapers, but the Times objects to the proposal.
While these issues are fought out in court, both parties have also moved into discovery. That basically allows one party to request evidence from the other, to properly support or refute the copyright infringement claims that form the basis of the lawsuit.
OpenAI Seeks NYT ‘Source’ Material
In its quest for evidence, OpenAI is particularly interested in the copyrights of the New York Times’ works. This includes copyrighted news articles, which are often based on a variety of information gathered by its journalists.
For example, discovery requests no. 10-12 read as follows:
NO. 10: Documents sufficient to identify the expressive, original, and human-authored content of each of Your Asserted Works.
NO. 11: Documents sufficient to identify the non-expressive, non-original, or non-human-authored content of each of Your Asserted Works.
NO. 12: Documents sufficient to show each and every written work that informed the preparation of each of Your Asserted Works, regardless of its length, format, or medium.
The New York Times is not happy with this approach. The company has refused to share reporter’s notes and other information, as this would be overbroad and too burdensome. In addition, the Times pointed out that much of the information sought by OpenAI is protected by the reporter’s privilege.
OpenAI Files Motion to Compel
The refusal has created a dispute between the parties and OpenAI has urged the court to weigh in. Ideally, the AI company wants the court to compel the Times to cooperate.
OpenAI suggests that the ‘source’ information will help to determine what parts of the articles are ‘original’ and worthy of copyright protection, which may help it to counter the copyright infringement claims.
“[T]he Times cannot pursue a claim for infringement over any part of a copyrighted work that is not original to the Times, as would be the case if the Times copied another’s work or elements in the public domain,” OpenAI writes.
In its complaint, the Times described how it invests enormous amounts of time and expertise on its articles, which are sometimes the results of months or years of in-depth investigations. OpenAI would like to know what this claim entails.
“Having chosen to put directly at issue how the Times created the works at issue—including the methods, time, labor, and investment—OpenAI has a right to discovery into the same,” OpenAI writes.
In preparation for its defense, OpenAI further wants to know what portions of the copyrighted articles are “expressive, original, human-authored content”, and what parts are “non-expressive, non-original, or non-human-authored content.”
New York Times Refuses to Comply
Responding to the motion to compel, the Times makes it clear that the company doesn’t intend to give in. It stresses that its articles are copyrightable, whether they include third-party material or not.
“OpenAI claims that the reporters’ notes underlying the asserted works may shed light on whether The Times’s news articles are really original, expressive content—but that is not how copyright law works. The expressive nature of a work is determined by reference to the work itself.
“Moreover, even in the improbable case that a reporter’s notes show that 90% of an article comprises verbatim quotes from the author’s original sources, that article would still be protected by copyright,” the Times adds.
In addition, the newspaper reiterates that the discovery requests are overbroad, and invade the reporter’s privilege. Although OpenAI stressed that it’s not seeking to identify any confidential sources, its discovery request could have a chilling effect.
If journalistic outfits are required to disclose all source material for every copyrighted article, it may severely impact their ability or willingness to bring copyright lawsuits against potential infringers.
But perhaps that’s precisely what OpenAI tried to achieve here, the newspaper notes.
“Indeed, given the wildly improper scope of this request, one has to wonder if a chilling effect is exactly what OpenAI, who appears to have stolen from millions of content creators, is hoping for,” the Times writes.
—
A copy of OpenAI’s request to compel The New York Times to share the requested information is available here (pdf). This also includes other disputed requests. The New York Times’ response can be found here (pdf).
* For the purpose of record keeping, the referenced court filings are the sources we relied on for this article. No AI assistance was involved. Human labor was required to select and organize some of the arguments put forward by the parties, while intentionally excluding others to add focus.