Originally published at: https://boingboing.net/2024/04/08/google-books-ingesting-the-ai-generated-rubbish.html
…
Quasi-related, there’s a good news item in today’s new york times (yes yes i know) about the early insane dash to get data to train big money ‘A.I.’ products, mostly at OpenAI, google and ‘meta’. Their large language models (LLM) were data-starved for input so they took to speech recognition and you-tube and similar sources for input. Just contemplate on the error rate of speech recognition one generally sees, (typically around 15% for diverse english speakers!), and then realize that that’s the A.I. garbage-in-garbage-out that they want to run our lives and diagnose our conditions.
Google books has always been a dumpster fire for quality control. It’s kind of a Google “tell” at this stage like “as of my last knowledge update”.
Im pretty much braced for the entire internet to sink into a godawful glob of Googley grey-goo gobbledygook.
Benjamin Bratton and Blaise Agüera Y Arcas describe this as the ‘Ouroboros Language Problem’ in this text written for Noema.
Basically, as AI generated material begins to become the majority of online content, the AI generators will increasingly being to take in more of their own output as source material. Presumably this would lead to an ever-lessening pool of probable results and a self-marginalization of AI as it enshitifies the internet.
I’m not surprised that they don’t care about the results; I’m a bit surprised that they aren’t more concerned about their precious bots.
My (layman’s) understanding was that a few rounds of inhuman centipede with bots training on bot spew had significant negative effects on the performance of the model.
Just As Planned.
Fortunately for human writers we can come up with new, original, well-written stuff because we have brains rather than a very large database of vocab concordances.
Funny! Because I’ve been calling it “Ourobouros only the snake(s tail) is made of shit”.
I rather hope today’s “AI” will never be as powerful or long-lived as that.
I understand the concern about the overwhelming flood of AI-generated content and Google’s pervasive influence on the internet landscape. As AI continues to evolve, there’s a risk of drowning in a sea of generic, algorithmically-produced content that lacks depth and authenticity. However, amidst this challenge, there are still avenues for curated, human-centric content and platforms that prioritize quality over quantity. By actively supporting and participating in these spaces, we can counteract the homogenizing effects of Google’s search algorithms and ensure that the internet remains a diverse and enriching ecosystem for all.
[Yes, this is of course AI generated grey goo]
I guess the real problem isn’t AI at all. It’s search engine optimisation that broke the contract between what people ask for and what online services serve up.
A search box has become a manevolent Djinni - Giving me what I asked for, but not what I want.
AI language models simply make the Djinni more powerful.Why homoginise? Now it can counjour custom content off the cuff. Content crowded with convincing communities.
People just like me, sharing their authentic views, every one of them a hommunculus of hype…
This topic was automatically closed after 5 days. New replies are no longer allowed.