Most of us don’t think about copyright very often in our daily lives. But in the age of generative AI, it has quickly become one of the most important issues in the development and outputs of chatbots and image and video generators. It’s something that affects all of us because we’re all copyright owners and authors.
Sadly, copyright and AI are something of a mess. The race to develop the most advanced AI models shows no sign of slowing anytime soon. In order to create those next-gen models, tech companies are looking for lots of high-quality, human-generated content. They need these works to make their AI models better, whether that’s giving a chatbot a more lifelike personality or an image generator more artistic styles to reference. On the flip side, AI enthusiasts might be wondering if it’s possible to receive copyright protection for AI-enabled creative works.
Most AI companies have been very vague about what content they use, which has led to more than 30-plus lawsuits winding their way through US courts. You might have heard of some of the most notable, like the New York Times v. OpenAI, in which the publisher alleges that ChatGPT used reporters’ stories verbatim without proper attribution or permission. (Disclosure: Ziff Davis, CNET’s parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) Meta’s also been in hot water recently, as The Atlantic reported and published a searchable database of all the copyrighted and potentially pirated books the company allegedly used without permission to train its AI.
I spend a lot of time thinking about copyright and AI in my work reporting on AI creative services. I’ve interviewed intellectual property lawyers, spoken with lots of concerned creators and spent way too much time breaking down legalese from government agencies. I’ve used that experience to make this guide on what you need to know about copyright in the age of AI, which we’ll keep it updated as things change.
What is copyright?
Copyright is a set of expressed rights that protect “original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced or otherwise communicated,” according to the Copyright Act of 1976.Â
In other words, copyright is a legal protection that gives original authors the rights to and control over their original works. Copyright protection can apply to books, art, music, movies, computer programs, blogs, architectural designs, plays, choreography and more. We’re all copyright owners. As the US Copyright Office puts it: “Once you create an original work and fix it, like taking a photograph, writing a poem or blog or recording a new song, you are the author and the owner.”Â
There are a couple of ways copyright intersects with AI. On the output side, people who use AI services like chatbots and image generators want to know whether their AI-enabled work is eligible for copyright protection. On the development side, there are a lot of concerns about AI companies using copyrighted material illegally. Here’s what we know so far.
Can I copyright an image or text I generated with AI?
As with many legal questions, the answer is: It depends.
Our guidance on this question comes primarily from the US Copyright Office, the federal agency in charge of administering copyrights. The Office has released a series of reports on AI and copyright with its latest guidance. In the second report, the Office maintained its position that images and videos that are entirely generated by AI are not eligible for copyright protection.
However, there are a number of generative AI editing tools now available. These tools aren’t used for wholesale creation, but they use gen AI to do things like add or remove objects, de-age actors or refine audio and video. You can still register and potentially receive copyright protection for AI-edited content, but you have to disclose your AI use. In the public record portal, you can see in the notes how people used AI in the creation of their copyrighted work.
Can copyrighted content be used to train AI?
The basic premise in copyright law is that the rights holder — usually the original creator, sometimes in other cases it can be a person’s employer — can decide how they want their works used. In many cases, owners choose to license their content; this lets people use copyrighted work, for a fee, with proper attribution. So if a copyright owner wants to give an AI company permission to use their content to train AI models, there’s nothing wrong or illegal about that. Many publishers, including the Financial Times and Axel Springer brands, have struck multimillion-dollar deals with AI companies to do just that.
Issues arise when AI companies potentially use copyrighted content without first receiving permission from the copyright holders. And that’s what creators are alleging happened in many lawsuits, including a class action lawsuit led by concept artist Karla Ortiz against Stability AI. There are currently more than 30 active lawsuits between AI companies and creators over copyright concerns.
Decades of copyright law precedent say that such a use, without permission, is not allowed. Some of the creators are alleging that the tech companies infringed on their copyrights. Infringement occurs when a copyrighted work is “reproduced, distributed, performed, publicly displayed, or made into a derivative work” without the permission of the copyright holder, as the Copyright Office defines it.Â
It will be up to the courts to decide whether the use of copyrighted material in AI development reaches the threshold of infringement. In the meantime, many tech companies are trying to pursue an alternate solution: A fair use exception.
What is fair use, and what does it have to do with AI?
The fair use doctrine is a fundamental part of copyright law, part of the Copyright Act of 1976. Fair use lets people use copyrighted content without the holder’s express permission for specific purposes. In the pre-AI era, fair-use cases included a teacher using a copyrighted book for educational purposes or a reporter referencing copyrighted work in news coverage. There are four factors that help determine whether someone’s use can qualify as a fair use, including:
-
The purpose of the use: How would the person using copyrighted material be using it? Commercial interests — whether someone can make money off the use — are important here.
-
The nature of the copyrighted work: What is the actual format of the disputed work — is it factual like a newspaper article or highly creative like artwork?
-
The amount and substantiality of the use: How much of a copyrighted work does someone want to use? Even if it’s only a little bit, if it’s the “heart of the work,” that might not be eligible for a fair use defense.
-
The effect on the market: By using a copyrighted work in a proposed way, is that going to be competing with the original author? And what effect will that have on the greater market?
There are questions about every factor when it comes to fair use and AI, Christian Mammen, an intellectual property lawyer and managing partner at Womble, Bond and Dickinson law firm, told me in an interview. There’s also a debate about whether the fair use factors apply to the AI input, output or both. “Does that apply on the input side, where you take the whole work in this training data, or does it apply on the output side, where there may be an unrecognizable, tiny bit of influence by any particular work in the output?” Mammen said.Â
Tech companies are pushing hard for a fair use exception because it would allow them to use copyrighted content without contacting every rights holder and paying licensing fees. For companies like OpenAI and Google, which have already spent billions of dollars on development, a fair use exception would save considerable time and money. Google said that fair use would allow it to continue innovating quickly; OpenAI took a parallel approach and said that unimpeded AI innovation is a matter of national security.
Giving tech companies carte blanche to run amok with copyrighted content isn’t something creators are excited about. In March, over 400 writers, actors and directors signed an open letter asking the Trump administration not to give OpenAI and Google a fair use exception. They wrote that Google and OpenAI “are arguing for a special government exemption so they can freely exploit America’s creative and knowledge industries, despite their substantial revenues and available funds. There is no reason to weaken or eliminate the copyright protections that have helped America flourish.”
The Copyright Office essentially punted on the issue of fair use, saying in its third report that there could be cases where a fair-use case could be made, but there are times when it wouldn’t meet the necessary criteria. Without federal legislation, it’s likely we’ll have to wait for some or all of these court decisions to set new legal precedent for copyright and fair use in the age of AI.
What does all of this mean for the future?
Copyright owners are in a bit of a holding pattern for now. But beyond the legal and ethical implications, copyright in the age of AI raises important questions about the value of creative work, the cost of innovation and the ways in which we need or ought to have government intervention and protections.Â
There are two distinct ways to view the US’s intellectual property laws, Mammen said. The first is that these laws were enacted to encourage and reward human flourishing. The other is more economically focused; the things that we’re creating have value, and we want our economy to be able to recognize that value accordingly.
“For most of our history, the humanistic approach and the industrial policy approach have been fairly well aligned,” Mammen said. But generative AI has highlighted the different approaches to copyright and IP.
“Do these laws exist primarily as an issue of industrial economic policy, or do they exist as part of a humanistic approach that values and encourages human flourishing by rewarding human creators?” Mammen asked. “At the highest, most abstract level, I’d say that is one of the questions that’s being forced by these debates.”