Well now, isn’t this just the perfect storm brewing in the digital realm? On one side, you’ve got the incredibly powerful, mind-bending capabilities of `Generative AI`, spitting out images, code, and text that look eerily human-made. On the other, the decidedly less sprightly, often-beleaguered world of `Copyright Law`, struggling to keep pace with technology moving at warp speed. It feels like trying to regulate bullet trains with horse-and-cart rules, doesn’t it? And right now, the tracks are getting particularly bumpy over the fundamental question: what happens when AI learns by hoovering up vast swathes of the internet’s creative output? Specifically, the sticky wicket of `AI Training Copyright`.

For months, maybe even years now, the whispers have grown into shouts. Artists, writers, programmers – creators of all stripes – are looking at these AI models and asking, quite reasonably, “Hang on, did you just train your multi-billion-pound model on my life’s work without asking or paying?” And the AI companies, naturally, are leaning heavily on a legal defence as old as photography and photocopiers: `Fair Use AI`. It’s a bit of a legal magic trick, really – a carve-out that says you can use copyrighted material without permission for things like criticism, commentary, news reporting, teaching, scholarship, or research. The big argument? Training an AI is just like teaching a very, very fast student by letting it read the entire library.

Copyright Cases Generative AI: Two Courts, One Gnarly Problem

This isn’t just academic chin-stroking anymore. It’s hitting the courts. And recently, two different courts in the United States weighed in on critical `Copyright Cases Generative AI`, grappling with this exact tension. The outcomes, while not final verdicts on the core infringement question, reveal some fascinating – and frankly, slightly worrying in one instance – judicial thinking on `AI and Copyright Law` in this new era. These cases, Andersen v. Stability AI and Doe v. GitHub, are absolute bellwethers for `AI Legal Challenges` and the future of `AI Development Legal Issues`.

Let’s unpack them, shall we? Think of them as two different judges looking at roughly the same elephant of `AI Copyright Infringement` but feeling different parts of it. One felt the trunk, the other the tail, leading to slightly divergent conclusions on how the law applies right now.

Andersen v. Stability AI: The Artists vs. The Image Makers

First up, we have Andersen v. Stability AI. This is the case brought by a group of artists against the companies behind popular image-generating AI models like Stable Diffusion (Stability AI), Midjourney, and DeviantArt (which hosts an AI tool). Their core complaint boiled down to a few key points:

The Training Data: The AI models were trained on colossal datasets scraped from the web, including millions of copyrighted images, allegedly without permission. This, they argue, is direct `AI Training Copyright` infringement.
The Output: The AI models can generate images “in the style of” specific artists, which the plaintiffs claimed constituted infringing derivative works or reproductions. This touches on `AI Output Copyright`.
DMCA Violations: Claims related to the Digital Millennium Copyright Act, often concerning the removal of copyright management information.

Now, how did the court react? It was a bit of a mixed bag for the artists at this preliminary stage of the motion to dismiss. The judge looked at the claims about the *output* and said, effectively, “Nice try, but you need to be much, much more specific.” Suing because the AI *might* generate something infringing is too vague. Fair enough, you need actual examples of alleged `AI Copyright Infringement` in the output.

The DMCA claims also largely went away for procedural reasons at this stage. But here’s the bit that raised some eyebrows, particularly from groups like the EFF: the court *didn’t* dismiss the direct copyright infringement claim related solely to the *training data copying* itself. The artists argued that the very act of copying their images into the training dataset was infringement, and the judge allowed this claim to proceed, at least past the initial motion to dismiss. Why is this notable? Because merely copying data into a temporary storage or database for analysis, without distributing the copies or the exact originals, has often been considered `Fair Use AI` in other contexts, like search engines creating indexes or researchers building text corpuses. The Andersen court’s hesitation to apply that principle robustly at this early stage felt, to some observers, a bit out of step with how courts have handled similar data analysis activities before. It left the door open for the idea that just having a copyrighted image in your training set *could* be infringement, separate from the output.

Doe v. GitHub: The Coders vs. The Code Copilot

Now, let’s pivot to the other case: Doe v. GitHub. This lawsuit was filed by anonymous programmers against Microsoft, GitHub, and OpenAI over GitHub Copilot, an AI pair programmer trained on public code repositories hosted on GitHub. The complaints here were structurally quite similar to Andersen:

The Training Data: Copilot was trained on billions of lines of public code from GitHub, much of which is under specific open-source licenses that require attribution. Training without respecting these licenses, the plaintiffs argued, was a form of `AI Training Copyright` violation or breach of licence terms.
The Output: Copilot sometimes suggests code snippets that are very similar, or even identical, to code in its training data, often without providing the required attribution or complying with the original licenses. This is the `AI Output Copyright` problem.
Privacy and Contract Claims: Other claims included privacy violations and breaches of GitHub’s terms of service.

So, what did this court do? This is where the contrast becomes stark, particularly regarding the training data. The Doe court took a much more favourable view of the `Fair Use AI` argument when it came to training the model itself. The judge looked at the process – reading and processing vast amounts of public code to learn how to generate *new* code – and saw it as highly “transformative.” The AI wasn’t just storing copies to redistribute them; it was learning patterns, syntax, and logic. This, the court implied, falls squarely within the spirit of fair use, much like a human programmer learning by studying millions of lines of open-source code.

Consequently, the Doe court largely dismissed the claims based *solely* on the act of training the model on the public code. This ruling aligns more closely with the traditional understanding of fair use allowing technical processes like indexing or analysis, even on copyrighted material, as long as the process is transformative and doesn’t substitute for the original work. Where the court *did* allow claims to proceed was on the `AI Output Copyright` side – specifically, when Copilot suggests code snippets that are substantially similar to training data without providing required attribution or adhering to the original license terms. This makes intuitive sense, doesn’t it? Learning from code is one thing; spitting out someone else’s code verbatim without credit is quite another.

Fair Use AI: The Legal Get-Out Clause?

Let’s talk a bit more about `Fair Use AI`. It’s determined by looking at four factors, as outlined in Section 107 of the U.S. Copyright Act:

Purpose and Character of Use: Is it commercial or non-profit? Is it transformative (i.e., does it add new meaning, expression, or purpose)? Training an AI is often argued to be highly transformative because the model learns concepts and patterns, not just memorises and reproduces the input. The output is usually different from any single input source.
Nature of the Copyrighted Work: Is it factual (more likely fair use) or creative (less likely fair use)? AI is trained on both, complicating things.
Amount and Substantiality of the Portion Used: How much of the original work was used? While AI training involves copying entire works, the argument is that only “portions” (features, patterns) are extracted in a meaningful way for learning, not the expressive whole.
Effect of the Use Upon the Potential Market: Does the new use harm the market for the original work? This is a huge debate in the AI context. Does AI-generated art reduce the market for human artists? Does AI-generated code replace programmers? Or does it create new markets and tools?

The courts weigh these factors. The Doe court seemed to place significant weight on the ‘transformative’ nature of AI training (factor 1) and the idea that the training process itself, while involving copying, wasn’t meant to substitute for the original works (factor 4, regarding the *training data copy* itself). The Andersen court, by letting the training data claim proceed, perhaps showed less conviction on the transformative argument *at that initial stage*, or felt the artists deserved a chance to argue it further.

Additional Guidance from the U.S. Copyright Office

Beyond the courts, the U.S. Copyright Office (USCO) has also been actively studying and issuing guidance on AI and copyright. On May 9, 2025, the USCO released the prepublication version of Part 3 of its Copyright and Artificial Intelligence series, specifically addressing the complex issue of using copyrighted materials to train generative AI systems. This 108-page report delves deeply into how existing copyright law, particularly fair use, applies to the ingestion of data for training. The report, while nonbinding, provides valuable insight into the Office’s current thinking and helps inform policy discussions.

The USCO report examines the application of the four fair use factors to the AI training context, echoing many of the points raised in the court cases. It acknowledges the arguments for training being transformative but also raises concerns about potential market harm and the nature of the works being used. The report also calls for further discussion on potential legislative solutions, including possible scalable mechanisms for licensing copyrighted works for AI training data.

Furthermore, the U.S. Copyright Office has consistently maintained that human authorship is a prerequisite for copyright protection, a position supported by recent court decisions regarding AI-generated content that lacks human creative input. This stance influences not only the input side (training data) but also the output side (copyrightability of AI-generated works).

In a related development, the U.S. government has also considered and adopted measures like the interim final rule adopting export controls on artificial intelligence model weights for certain advanced closed models (as referenced in Reg. 4544 on January 15, 2025), highlighting the growing intersection of AI technology, intellectual property, and national policy.

What Does This All Mean for the Future of AI?

These divergent outcomes in the preliminary court rulings, coupled with the nuanced analysis from the U.S. Copyright Office report, highlight the uncertainty swirling around `Copyright Law AI Training`. The Doe v. GitHub ruling, which looked favourably on the fair use defence for the training process itself, feels more aligned with the historical application of fair use to technologies that facilitate analysis and learning (like search engines or data mining, as seen in cases like Authors Guild v. Google). It suggests that copying data purely to train a model that generates novel output *might* largely be permissible under `Fair Use AI`.

However, the fact that the training data claim survived the initial challenge in Andersen v. Stability AI is a reminder that this is far from settled. If courts were to ultimately rule that merely including a copyrighted work in a training dataset is infringement, regardless of the output, it could have seismic implications for `Generative AI Copyright` and `AI Development Legal Issues`. Imagine needing to licence every single image, poem, or line of code used for training! It would be an administrative nightmare and could stifle innovation significantly.

These `AI Legal Challenges` are forcing a much-needed conversation about how `Copyright Law` should adapt. It feels inherently unfair to creators if their work is used, without permission or compensation, to build tools that might directly compete with them or dilute the value of their skills. Yet, a maximalist approach to `AI Copyright Infringement` on the training data side could cripple a technology with immense potential benefits.

Perhaps the path forward lies closer to the Doe ruling’s emphasis and the direction suggested by the USCO report: allow the ‘reading’ or ‘learning’ phase (training) as fair use, but maintain strict scrutiny and liability for the ‘writing’ or ‘output’ phase, especially when the `AI Output Copyright` is substantially similar to copyrighted input or violates license terms (like attribution). This would protect against direct copying while allowing the technology to learn and evolve.

It’s clear these won’t be the last `Copyright Cases Generative AI` we see. As AI gets more sophisticated and its use more widespread, these fundamental questions about where the data comes from and who owns the results will only become more pressing. The legal system, bless its heart, is doing its best to catch up, one case and one comprehensive report at a time.

What do you make of these rulings and the Copyright Office’s guidance? Do you think AI training should be considered fair use? How should `Copyright Law AI Training` balance the rights of creators with the potential of `Generative AI`? Let us know your thoughts below!

Have your say

Join the conversation in the ngede.com comments! We encourage thoughtful and courteous discussions related to the article's topic. Look out for our Community Managers, identified by the "ngede.com Staff" or "Staff" badge, who are here to help facilitate engaging and respectful conversations. To keep things focused, commenting is closed after three days on articles, but our Opnions message boards remain open for ongoing discussion. For more information on participating in our community, please refer to our Community Guidelines.

Two Court Rulings on Generative AI and Fair Use: Which One Prevails

Copyright Cases Generative AI: Two Courts, One Gnarly Problem

Andersen v. Stability AI: The Artists vs. The Image Makers

Doe v. GitHub: The Coders vs. The Code Copilot

Fair Use AI: The Legal Get-Out Clause?

Additional Guidance from the U.S. Copyright Office

What Does This All Mean for the Future of AI?

World-class, trusted AI and Cybersecurity News delivered first hand to your inbox. Subscribe to our Free Newsletter now!

Have your say

Table of contents [hide]

Latest news

Must read

You might also likeRELATED

Categories to explore

Contribute as an author

Who we are