Moderation strike on Stack Overflow due to enforced policies regarding AI-generated content

Another round of enshitification: Stack Overflow, Inc. apparently does not allow moderators to remove AI-generated answers on Stack Overflow and related sites, allowing spammers to flood the zone with shit - I mean, with eloquently worded, superficially consistent, but quite possibly false answers.

So a group of moderators have started a moderation strike, explaining their motivations in this open letter:

https://openletter.mousetail.nl/

I have signed it and would like to encourage everybody using Stack Overflow to sign as well.

I think it is a very fair demand that moderators may remove AI-generated on content on the sole ground that it is AI-generated, as otherwise, it will become impossible to moderate a site like Stack Overflow. Masses of eloquently worded and superficially consistent answers generated by LLMs can be posted in seconds, while checking and moderating them is a manual and tedious labour that the moderators are volunteering to do. I fear that without a general ban of LLM content, moderation will become impossible, and without proper moderation, Stack Overflow et al. will be rendered useless within weeks.

Why not judge AI-generated content by the same standards as other content?

Content by generative AI like ChatGPT is a special blend of garbage: While superficially well written, it often embeds incorrect answers in correct but irrelevant waffling, made-up references, and non sequiturs. The primary intended mechanism against such content is downvoting and editing, not deletion ā€“ moderators donā€™t act based on correctness for a good reason. However, AI content often bypasses the voting system by tricking inexperienced users into upvoting and accepting due to its superficial quality.

Moreover, while humans occasionally produce content with similar properties, this requires some effort. By contrast, an AI can produce such content in seconds, while still causing the same effort in fact-checking and moderation ā€“ at least as long as we are required to handle it by the same standard as human answers.

8 Likes

Iā€™m honestly a little puzzled by Stack Overflowā€™s move here.

Not because I expected better of them; but because they seem like one of the outfits that would be looking to cash in by selling their userbaseā€™s hard work as training gruel for the bot-herders. And Iā€™d assume that the bot guys are at least aware enough of the limitations of their product to not want their bot training on the regurgitations of someone elseā€™s bot in some sort of LLM-centipede.

Is there an angle that Iā€™m missing in terms of whatā€™s in it for them? It just seems like madness, even from a purely profit maximizing standpoint, to burn down a successful electronic sharecropping operation that produces a fair amount of genuinely solid output in favor of going into the cutthroat and moatless business of operating latter-day linkfarms full of AI spew.

8 Likes

I wonder if itā€™s as simple as not wanting to publicly insult AI developers/companies. IIRC StackOverflow initially had a strict anti-AI answer policy. I wonder if some private meetings turned things around. They are a ā€œdeveloper spaceā€ after all, so starting a war with influential tech companies might be risky (though morally correct IMHO)

7 Likes

Thatā€™s certainly more plausible than any more direct benefit.

It does raise the question of why the AI bros would want to shit where they eat and/or expose their product to the irritated judgement of their peers; but Iā€™m guessing that that would be ā€˜raging hubrisā€™ rather than a business case Iā€™m failing to grasp.

7 Likes

Just ran across a paper discussing what happens when bots get trained on AI slurry:

ā€œWe find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs.ā€

Perhaps it will turn out to be solvable without needing to control input quality; but as written it certainly looks like reason to expect that the inputs the botherds will find most valuable are the ones that resist their products most vigorously; while the undefended or actively enthusiastic will become at least as useless to the bots as they are to the humans. Frankly ominous figure reproduced from paper:

image

6 Likes

Thanks for that paper its an interesting point.

Garbage In Garbage Out.

The challenge with information on the internet is that there is plenty of garbage :wink: And sorting common garbage from rare truth is a really hard problem if you donā€™t have the correct knowledge to start from.

4 Likes

Wow, very interesting paper indeed, thanks!

I donā€™t think it is possible. The paper says that the distribution tails disappear, and because these are massive models with millions of parameters, those tails are regions in a million-dimensional hypercube basically. Without data from those regions, a model canā€™t know which parts of this vast parameter space are not very likely but still possible.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.