Originally published at: New York Times sues OpenAI and Microsoft, claiming copyright infringement - Boing Boing
…
I thought it was a little odd back when newspapers seemed to be interested in buying into OpenAI.
In some ways, they already have bought into AI. It’s great to see a major news outlet sue Big Plagiarism Machine, but then, sports stories for instance have long been written at times by what amounts to AI.
I think there’s also some hypocrisy to point out in news corps suing when AI appears poised to cut into their profits, but also exhibiting great interest when AI can help them cut jobs.
I’m not sure if it’s fair to imply a homogeneous “they”.
It’s certainly true that ‘AI’ has already proven attractive to people doing low-value content milling and wishing to do it cheaper and faster; but (even in principle) it has less to offer anyone looking to do original reporting.
They may all be ‘media’ in the broad sense; but someone doing high school sports scores or viral listicles isn’t really in the same business as the Times is.
But the use of AI in both is taking jobs away from people. That article about sports stories being written by AI wasn’t about reporting scores. It was about taking the scores and other available game data, and using that as input to software to automatically generate a story on the game. It’s the same issue the NYT is complaining about here, just on a smaller scale. How did it “learn” to write a story from the game data? It had to be fed existing stories. Were those writers compensated? Fuck no. So I do think it’s fair to imply a homogenous “they”.
The “Gray Lady” going after AI?
It’ll be interesting if these various lawsuits succeed, and “AI” companies are effectively prevented from using anything from the corpus of modern texts (because there’s no way to filter out infringing work and still have enough material). If they were forced to only use pre-1924 texts, and it was enough material to still work with, that could be rather cool - you’d know when you ran across AI-generated text because it would sound archaic to some degree.
Verily, what thou art declaring sounds to me muchly akin to trooth.
They could also license the works they’re using. Some of that has already happened. They don’t want to do that, of course, because it would cost money and defeat the whole purpose. I think this is the answer, though. These “AI” programs are using other copyrighted works as input to modify and create “new” works. Now, the companies will argue that that’s the same thing human writers do, that all human writers are influenced by the writings of other people. But it’s not the same thing. This software is, in a defined, algorithmic way, taking exiting works filtered by prompts, and manufacturing output. I suppose in some nebulous way, the human brain does the same thing, but it is not easily defined by an algorithm. The AI is. Hell, we can look at the code. It’s complex and long, but it exists and it can be read. The human brain is that times several orders of magnitude.
Machine translation also relies on AI training.
Imagine if the AI had learned to omit the sex scenes from French novels.
I’m assuming, given the enormous amount of material required, that the licensed work is just used as the basis for a style applied at the end - but it still needs all the unlicensed work to for the basic functionality. So if this was the legal/economic model, you’d presumably have licenses when it’s intended to resemble a specific artist’s work, but all the artists whose work was the basis for the “generic” output would get nothing. (And no one would admit to it resembling a specific artist’s work unless they specifically wanted to make use of their name, or had already hired the artist and were generating matching material to fill in here and there.)
The kind of mass-scraping of the internet, modern literature and journalism that’s the basis for “AI” training data seems like it would make any sort of real licensing impossible - they can’t even say what’s in the training set (e.g. child abuse material).
I keep seeing “AI” apologists claiming that when an artist walks into a museum, they build a mental model of all the work that they see, which becomes their “inspiration” in exactly the same way the “AI” works, and all I can say in response is
I’d argue that even in a nebulous way, it’s not doing the same thing - unless we’re also operating purely in the realm of metaphors. I.e. Freudian “the brain is a steam engine” levels of metaphor.
Gotta love a lawsuit where you wish there was a way for both sides to lose.
Oh, that one’s easy.
I think this lawsuit is reasonable and legally sound. The AI brokers are just hoping that everyone will be so hypnotized by AI tech and the money making possibilities that the courts will happily ignore sound legal precedent. Simple: they need compensate the sources that the AI engines are using.
I’m not linking to the original source as he still uses Substack (I get the mail) but Gary Marcus summarises a post where he shows multiple examples of infringement from major IP (Star Wars, Toys, Mario etc.)
“ The cat is out of the bag:
- Generative AI systems like DALL-E and ChatGPT have been trained on copyrighted materials;
- OpenAI, despite its name, has not been transparent about what it has been trained on.
- Generative AI systems are fully capable of producing materials that infringe on copyright.
- They do not inform users when they do so.
- They do not provide any information about the provenance of any of the images they produce.
- Users may not know when they produce any given image whether they are infringing.
§
My guess is that none of this can easily be fixed.
Systems like DALL-E and ChatGPT are essentially black boxes. GenAI systems don’t give attribution to source materials because at least as constituted now, they can’t .”
Much of which is what I am blue in the face saying at work. 2024 - when the bubble bursts.
This topic was automatically closed after 5 days. New replies are no longer allowed.