Artists sue developers of Midjourney and Stable Diffusion, claiming copyright infringement

Technically, yes, it’s sampling, because the model is trained on the body of work of other artists. We’re talking at cross purposes here. Sampling is a well understood term, and is far broader than the narrow definition you’re giving it. I already understand the technical aspects here. What you’re ignoring is the ethical ones.

What is happening is theft. They are stealing others work, using it to train an AI to recreate similar work, and then not paying the original artists for providing them with the works they dumped into their model (aka, sampled). It’s unethical, it’s sleazy, and it’s wrong. I very much hope the courts slam the door on this, because it’s another douche bro “disruption” of an existing market, one that harms creators who already are struggling at the bottom of a deep well of corporate fuckery.

8 Likes

Say the original images were 1MB minimum. At 5 billion images input the model would need to be 5 petabytes. The model is not 5PB.

1 Like

If I study 100 of the works of an illustrator and then draw a new work stealing that style, I’m a hack but I haven’t committed a crime.

The models are studying but not copying the works of artists. The studied works exist only ephemerally as weights in a network.

It would be unethical to charge a person for retaining memories and knowledge. It’s interesting that when it’s a computer that is retaining knowledge the ethics flip!

I’m pro artist and think AI will ruin as much of daily life as it will improve. Perhaps through legislation we should make opting out of training actually enforceable.

Maybe the internet was a mistake.

3 Likes

It’s actually kind of hard to get visuals out of these AI models that have any value. It takes a fair amount of work narrowing parameters and inputs. Your avg person gets total crap out of these things. This is very similar to chat gpt or copilot for code. They are more tools than replacements for the tool user.

I think an artist who trains their own model on their own body of work and then offers “AI profile pics” from that private model would produce results so far far beyond what the general model could do for somebody who wanted to steal the style of that same artist. For example.

In other words, you’re not claiming that the model includes samples of artwork, but that the artwork was sampled (in signal processing terms)?

Mostly, the reason I think the courts won’t be able to do anything about the model generation itself is that what’s saved in the model are statistical “facts” about the dataset’s images and keywords. It’ll probably have to be something about the model creators breaching the image host’s terms of service by downloading the images, because copyright only protects the creation of copies and derivatives and their distribution. Creating a statistical analysis of a corpus doesn’t infringe as far as I know; because no significant chunk of the copyrighted work survives. But that’s with words so you may end up being right about this in regards to images.

Enforcing site TOSes under the Computer Fraud and Abuse act might be another option. New laws to prevent appropriating other people’s styles, whether by computer aid or not. …I seem to be running low on sugar…

Technically, yes, it’s sampling, because the model is trained on the body of work of other artists.

Then playing a song by ear would be considered sampling, but it’s not. Training is equivalent to looking at the original work and figuring out how to draw it yourself. The only “copying” being done is the process the software figures out.

Sampling is a well understood term, and is far broader than the narrow definition you’re giving it.

Sampling involves the use of an original work or a part of it in a new work. That is not happening with AI art. None of the original work is in the results.

I already understand the technical aspects here. What you’re ignoring is the ethical ones.

Legality is not ethics.

What is happening is theft.

That’s a moral argument. I’m talking about copyright law.

They are stealing others work,

That’s a non-legal, poetic use of the term “stealing.” Theft involves depriving someone of a limited, rivalrous good. Even copyright infringement isn’t considered theft. It’s not useful to try to charge the issue with inaccurate terminology.

using it to train an AI to recreate similar work,

You’re admitting here that it’s not copying or reproducing the original work…

and then not paying the original artists for providing them with the works they dumped into their model (aka, sampled).

Would you expect to be paid if someone could look at your work and reproduce the style from memory?

It’s unethical, it’s sleazy, and it’s wrong.

I’m an artistic who has had his art used to train AI (who hasn’t at this point though?). I don’t consider it a violation of my copyrights. I put it out there to be seen. I can’t be anymore annoyed at this than I would be if someone saw my art and created something similar from memory. I’ve had people (including large corporations) actually violate my copyrights before. This isn’t that.

I very much hope the courts slam the door on this, because it’s another douche bro “disruption” of an existing market, one that harms creators who already are struggling at the bottom of a deep well of corporate fuckery.

You’re describing a symptom of artists being underappreciated. That will continue regardless of what happens in these cases.

1 Like

I found my wife’s artwork in the Laion-5B and Laion-400M datasets. You can use https://haveibeentrained.com and upload an image to see if it was used in the training. I simply took took an image from her website and a match was found with a link a different thumbnail (“95% match”, no shit). A few of the other matches have the author’s watermark and other point at ebay listings.

I’m not sure where I stand on this whole subject, so I’ll not comment there.

6 Likes

I think Cepheus42 is correct- these programs almost certainly violate copyright.

A common defense seems to be ‘but if I learned to paint like Picasso no one would sue me for creating works in his style’

However, the difference is that you are a person, not a mechanical device. These AI programs are not impenetrable black boxes or people - they can be fully described mathematically. That mathematical description would necessarily include every artwork that the program has been trained on.

This means it is not unreasonable to say that the artworks used to train the program have been sampled into the software. Just because they no longer look like images does not mean they aren’t in there.

AI art companies need to properly license every artwork they include in their software via training or else it is essentially theft.

6 Likes

If you think of the AI as more or less approximating the statistical distribution of realistic/artistic images within the set of all possible bitmaps, there should be enough of an analogy to information theory to allow some sort of compression scheme to work.

1 Like

And it’s a bullshit defense. I mean, it’s a defense for why it doesn’t violate someone’s copyright to use the program to copy someone’s style, but it says nothing about why the creation of the model doesn’t violate copyright.

I’ve already explained to the… “best” of my meager ability why the generation of the model also doesn’t violate copyright, but the salient point is that nope, they’re not there. It’s like making a frequency list of words in The Hobbit. You know how common each word is, but you don’t know what page they came from.

Copyright will not save us here. It simply doesn’t cover this. The model isn’t a derivative work. We’ll need something new.

3 Likes

Right, but “approximating the statistical distribution of realistic/artistic images within the set of all possible bitmaps,” isn’t “my image is stored and republished in this model”. I’m failing at finding the size of stable diffusion in MB, but I’m reasonably sure that one five-billionth of the model is still too small to reasonably encode a single image.

Edit to add: I refuse to call it an AI until it shows Intelligence. It’s a model, a program or a giant neural net until then.

2 Likes

Sampling the ACTUAL song is already illegal. Creating a song that even SOUNDS sort of like another song IS ALSO ILLEGAL, even if you are not sampling one to one or using the exact same piece of music. Or did you not know this?

Sampling involves the use of an original work or a part of it in a new work

Again, you’re narrowing definitions to suit your need. Sampling is also plugging tons of data into a computer algorithm and spitting out results.

Legality is often ethics, yes. Legality is used to DEFINE ethics. But not every unethical thing is illegal, sure. The courts have not decided if this unethical behavior is illegal yet. That’s what we’re discussing.

Laws are moral arguments. The courts have not decided on these copyright issues yet.

They are using, without compensation, the labor of another person to create a computer model to generate art that will directly compete with that labor. The courts will decide, but I feel pretty confident that meets most definitions of “theft.” Hence the lawsuit.

You’re admitting here that it’s not copying or reproducing the original work…

I’ve said sampling about a dozen times now. Sampling DOES NOT MEAN COPYING DIRECTLY. Only you are defining it as such.

I would expect they’re a pretty damn good artist. I would not expect they are a computer algorithm trained by using my exact work, not just someone able to copy it. That’s my labor used to create competing products.

Good for you. That’s your right. Other artists clearly disagree with you, enough so that they are suing. I hope you are as supportive of THEIR copyrights.

Yes, but perhaps we could stop making it worse? No? I mean, if this is the capitalistic dystopia you want, I’m sure we’ll get there in time.

2 Likes

A lot of the recent scientific discussion of neural networks etc is that they are black boxes.

1 Like
4 Likes

7.7 gb if I’m looking at the right file. I agree that it isn’t going to be the case that every one of these images will be extractable, even in theory, from the network weights. I wonder how close you can get, though.

(Btw, I’m calling it AI because stable diffusion and transformers aren’t just neural nets and I don’t have another term ready. Plus, people have called so many things AI by now that I’ve long since stopped thinking of AI as being… special?)

1 Like

Sampling the ACTUAL song is already illegal. Creating a song that even SOUNDS sort of like another song IS ALSO ILLEGAL, even if you are not sampling one to one or using the exact same piece of music. Or did you not know this?

This isn’t strictly true. Not all sampling and not all rendering of songs that sound “like” other songs are illegal. It’s determined in a court of law on individual cases.

Again, you’re narrowing definitions to suit your need. Sampling is also plugging tons of data into a computer algorithm and spitting out results.

Actually you’re conflating an information technology definition with a copyright law definition to suit your need. They aren’t the same thing.

Legality is often ethics, yes. Legality is used to DEFINE ethics. But not every unethical thing is illegal, sure. The courts have not decided if this unethical behavior is illegal yet. That’s what we’re discussing.

Oh hell no. Legality is often completely unrelated to ethics. By your argument, adhering to the Nuremberg Laws and Jim Crow Laws were ethical and I’m going to assume in good faith you wouldn’t agree with that take. A hell of a lot of behavior in a capitalist society is unethical, but isn’t illegal, especially when laws can be bought for benefit of the wealthy and powerful and their preferred in-groups. This is a complete non-starter.

Laws are moral arguments. The courts have not decided on these copyright issues yet.

Again, hell no. Laws are rules decided by people in power on how to allow and limit behavior in society and those people and their motives may be and often are corrupt and unethical. Whether laws are ethical or moral is completely unrelated and very subjective. Sometimes they coincide, like laws against murder and rape, but there’s no inherent connection just because some legislators decided to make a particular rule.

They are using, without compensation, the labor of another person to create a computer model to generate art that will directly compete with that labor.

I’m using without compensation the labor of your words to disagree with you. It doesn’t make it copyright infringement or unethical to quote you.

I feel pretty confident that meets most definitions of “theft.” Hence the lawsuit.

The only definition that matters is the legal one and even copyright isn’t theft by legal definition, so you’re being inaccurate in your terminology here. It doesn’t help your argument to be loose with words. It makes it seem like you’re trying to stir an emotional response. It’s like calling abortion murder or taxes theft.

I’ve said sampling about a dozen times now. Sampling DOES NOT MEAN COPYING DIRECTLY. Only you are defining it as such.

But it’s not a copyright violation without copying, so you’re admitting it’s not a copyright violation. At best you could argue it’s a civil license violation.

Good for you. That’s your right. Other artists clearly disagree with you, enough so that they are suing. I hope you are as supportive of THEIR copyrights.

Why would I need to be supportive of copyrights that aren’t being violated?

Yes, but perhaps we could stop making it worse? No? I mean, if this is the capitalistic dystopia you want, I’m sure we’ll get there in time.

This has nothing to do with what I want. I’m not advocating for artists to be underappreciated. They will be regardless of how I feel on the matter. To not make it worse means we need to make significant changes to society (something I would support). Those changes won’t result from copyright lawsuits, but from elections and legislation and a significantly different Supreme Court lineup.

3 Likes

Digital artists, at least, understand the technology well enough, Rob. Don’t be condescending.

3 Likes

The technology in this case is giant neural nets, mostly. What proportion of digital artists have a masters-level education in comp sci? Or could explain tensor algebra?

I didn’t say the artists could BUILD the damned thing. You don’t have to be able to build something to understand the function is unethical. It’s also not a one-size-fits-all situation. There are legitimate and intriguing applications of AI art technology. Then there are groups who think they’re going to cash-in easy on the work of artists.

3 Likes

To understand the technology, the statement you held Beschizza to, one would need at least an understanding of tensor algebra.

This thread shows so much bad understanding - like the divergence on whether or not the model contains the original images in any meaningful sense.