Essays explore the hellscape of freelance AI model training

Originally published at: https://boingboing.net/2024/05/12/essays-explore-the-hellscape-of-freelance-ai-model-training.html

2 Likes

The primary reason companies are investing in machine learning and LLM models is the promise of companies without worker salaries to pay. The goal of the VC investment is to undermine labor more than they can through outsourcing alone and create magic companies with investors and profits and nothing else.

Of course the field is going to exploit workers and cut pay; that is their stated end goal

21 Likes

AI training is only going to get worse, trying to turn shit into usable data.

My paraphrase of the below article: “LLM-generated content tweaked by humans is a good data source for LLMs. By ‘good’ we mean ‘not as good as human-generated data but it’s cheaper and probably good enough’. We’re running a damn business here, what do you expect us to do, feed them the best possible data? Lol. Lmao.”

If you’re an “ML is consciousness” type person, this is treating your AIs like factory farmed livestock, feeding them their own waste product to save money

14 Likes

Reading through the article I’m still trying to figure out what “work” they were actually doing. Reading an 85 page powerpoint slide deck? Who was it even created by? Is it people all just feeding into the same machine, creating endless work for each other and nothing of actual productive value?

And here I figured the article would have just placed freelance AI model trainer maybe on par with a Facebook moderator, which I feel is only a half rung higher than a 4Chan mod.

9 Likes

It is adding metadata and labels to datasets for training models – kind of like doing endless captchas so the model has training data to then categorize things like the people. Which is to say the best possible output of the models would be the accuracy of thousands of frustrated workers trying to rush through as many tasks as possible without being certain what they’re supposed to be labeling in the first place

Not that the “how it works” page is terribly helpful, but this is their description of the work:

13 Likes

4chan has moderators? :astonished:

10 Likes

The work varies. Sometimes it’s adding metadata and labels, like Scientist said. Sometimes it’s rating or ranking different LLM responses to various user prompts on qualities like how truthful they are, whether they contain harmful/hateful language, etc. Sometimes it’s writing responses to user prompts on pretty much any topic, and trying to make the human response sound better than a LLM response would be.

14 Likes

Data of the shear quantity an ‘A.I.’ needs to become statistically ‘interesting’ is expensive. So various forms of penalty-less theft (“scraping”) and re-hashing of in-house databases are required to deliver the product by ‘next quarter’ as opposed to next quarter century.

Well, according to some people who have recently done this work for one of the biggest AI companies in the world, the work of training AI is chaotic and inconsistent at best.

That “chaos” is introduced when an attempt is made to pre-curate those enormous databases. Imagine removing all the grotesque social opinions (and outright falsehoods) from 16 terabytes of reddit scrapings! …yeah “inconsistent”, as in: can’t be done. (line#75231: “if response contains ‘hannity’ then send to mechanical_turk”)

5 Likes

Tv Show Comedy GIF by HULU

10 Likes
4 Likes

… venture capital dollars must be spent :crazy_face:

Correction: Article says "AI", but it's just about LinkedIn.

This is just what LinkedIn is.

• PhD? Great! Do this menial coding task interview that completely disregards years of domain-specific knowledge you’ve acquired.

• Just 3 more hoops to jump through, then I’ll ghost you.

• I’ve got a fantastic job opening that’s perfect for you. No relocation required.
…2 interviews later: maybe like 1 wk/month travel, no biggie.
…2 more interviews: they loved you, but decided to go with an onsite candidate

2 month long interview process, “sorry, it’s our policy not to provide feedback on hiring decisions”

🤬‼️

10 Likes

But not on employees. Didn’t you notice all the reports of missing and reduced pay and salary cuts?

7 Likes

Well you can’t sell employees. Yet.

3 Likes

Sure you can. Change the company that holds their contract to a shell corporation whose only asset is its employee roster, then sell that.

You can’t sell an employee individually, but I imagine doing it wholesale is just a matter of accounting.

Plus, once you’ve done that, the completely hollow contracting shell corporation doesn’t have any assets to back superannuation/health insurance/required regulatory payments, and you who is merely contracting to the shell for the services of the employees have no obligation to pay such.

5 Likes

I’m honestly a little surprised. Not that it’s abusive piecework hell, most contemporary labor market ‘innovation’ appears to be taking various ugly historical practices and slapping an app and a sticker with “Independent Contractor” scrawled on it in crayon on top; but that it sounds so chaotic in an actively inefficient way.

“Withheld or missing compensation” is the sort of chaotic practice that can definitely pay for itself, especially if payroll errors mysteriously happen mostly in one direction; rather than over and underpayment being equally likely; but things like just losing work or having stuff being done by people who you’ve not actually bothered to tell what good work looks like sounds like real money being either tossed into the void or turned into output that may or may not even be usable.

I’m curious how and why they’d arrive there: is it just a bunch of finance bros and enough techies to keep the LAMP stack on; with basically no managerial expertise? Did someone define some quantity-based metrics, because those are easy, and they are now just blindly chasing them regardless of the effect on output? Is it actually malice all the way down and someone has thought hard about the problem and concluded that the inefficiencies produced by atrocious management are less costly than competent management would be, so that’s the plan?

2 Likes

I assume it’s “losing” work that the company actually did extract data from, but somehow forgot to pay for it. And getting weak data from less trained people was also somehow useful, but the company somehow forgot to pay for that input. The House always wins.

2 Likes
2 Likes

This topic was automatically closed after 5 days. New replies are no longer allowed.