Ditto comparing horses to cars or solar panels to the way plants harness energy.
One may perform some version of a task provided by the other, but we certainly wouldn’t want to live in a world where the law or the culture at large didn’t see any fundamental distinctions between the two.
Given the enormous size of the training set data, I wonder how many poisoned images you’d have to feed into the system before it started breaking down. I could see artists protecting their specific style by doing this, but even then, it’d only work if their work hadn’t already been scraped.
Adversarial attacks on “AI” image recognition, by way of subtle, non-obvious (to humans) changes to pixel values, has a long history. It’s just never had much of a practical application until now.
“…networks are prone to error through attacks that confuse the model into making wrong predictions when small changes are introduced to the training dataset. The small pertubations are designed to have a significant effect on the model’s performance even when the change is not visible to us.”
the other interesting thing is that – really – the “ai” has probably consumed more work than i or any other individual human ever will. humans don’t need data to generate art, they need experience. it’s fundamentally different even with the “argument” people trot out about neurons
But… but… but… Elon’s cyborbmonkeys are teh F4T4r3!!! why do you want to hurt the cyborbmonkeys and not let them do the creatives… WHY??? /s
These are training ON THEIR WORK, without permission, so YES… hypothetical “AI” that doesn’t exist do not have more rights than living human beings who do…
It’s like some people around here have never read any Cory Doctorow, or at the very least, thought his work were guidebooks…
But of course, the current existing AI and “neural implants” aren’t remotely like we see in sci-fi…
Exactly. What if one could intercept sat images from space before transmission back to earth and change the locations of hundreds of tanks and missile launchers to extra trees, cows and farm tractors?
I think in theory, this might work. And it can even be tested out in pytorch, but the problem is that there is so much secrecy around how the big LLMs are trained (possibly even for this very reason), that we don’t know if the big LLMs will fall for the same traps that pytorch does. And even if they do, it would be hard to empirically test how well it works.
The full paper is here if you want the details, but the tl;dr: is that while a complete data set is large, some specific prompt concepts are limited enough to be susceptible with a small amount of poisoned training images.
While not exactly what they’re going for, I’d imagine using a technique like this could be quite effective at frustrating efforts to train LoRA models to match a specific target.
This doesn’t seem feasible to implement at scale, however. If future training pipelines include tools that can check and correct this kind of adversarial input, you’d have an ongoing cost to keep any public-facing media updated with the latest protection algorithms. It’s possible we could see content server plugins to do this kind of alteration on demand, but it seems more computationally expensive than, say, image compression.
Assuming the image analysis was being done with a neural network, yeah (and I suspect a lot is, these days) - and something like NS would be the tool used to transform targets into… something else.
I wonder if, like with LLMs, even the occasional “hallucination” could potentially make an image generating tool useless (Seems like image generators are more likely to be used with a human in the loop to notice poisoned output, whereas things like ChatGPT are being used without any human eyes on the process.)
The way current open genAI models work, it’s relatively simple(if not cheap) to remix one with unwanted concepts filtered out or additional training checkpoints included. The hard part would be detecting which concepts were poisoned, but if someone noticed, that specific data could be replaced without having to re-train the entire model. Presumably the closed-source models have a similar process.
This style of attack is only really effective if all available copies of images pertaining to a specific concept are poisoned, and even then only until someone trains a model to account for it. Even if an artist rigorously “re-nightshades” their images whenever a new protection algorithm is released, any prior saved images will still be vulnerable.
I can see that if we do not intervene soon the future will be flooded in low effort content spam.
We have been here so often it’s tiring. Tech bros are once again so enamored with a hyped technology that they refuse to see the negative externalities.
Oddly enough, no, I’m not the least bit concerned about what LLMs “deserve.” They are not in any way sentient. LLMs are no more a concern to be protected than any other algorithm, you may as well be concerned about what Dijkstra’s algorithm deserves when it’s used to calculate a route.