Originally published at: https://boingboing.net/2019/08/23/news-plagiarism-sites-run-real.html
…
There must be some kind of software for this that is widely available. When I worked as an academic teacher, I’ve discovered a few plagiarized diploma theses with such automatic substitutions. Resulting text was just as absurd as that news bot’s output.
On a vaguely related theme:
It’s called “Content Spinning” … sold by SEO optimization sites, despite its dismal results.
Invoice!
Invoice!
Invoice!
Invoice!
Hey, you just ripped this off from my website and ran it through a thesaurus!
bouncebounce.org, 8/18/2019:
Funniness sideways, the gauge of the con is such that it styles authentic coinage, which it is eventually grudging its wounded of. There was a contact about an era ago where the quantity of positions rasping Bounce Bounce became myriad, but gloomily nobody of them revolved our firm-pouring treatment into attractive packing descriptions.
Baby kangaroo did this in an episode of friends once.
“I used a Theesaurus”.
Anyone reading that mangled nonsense isn’t going there for the news, for information, they’re going there for comedy, which I think makes it truly new content, and so it’s “transformative use” and it’s not depriving it’s victims of anything.
What about the average reader that stumbles upon these mangled stories? Sure, you click and somewhere someone makes five cents or whatever, but at some point the reader figures out the site is complete bullshit and moves on. I can’t see a reasonable person clicking even further for real news other than a morbid curiosity.
When I read something and a few words seem very out of place I look at the author’s name. Anything Chinese or ‘exotic’ I can put down as a language gap. It happens. But a whole story full of inconsistencies ?
Nope, I don’t purchase it.
(best I saw last week, an accident victim who’s injuries were inconsistent with life)
You know, I had assumed that when you see this sort of thing in porn titles that it was bad translation, but perhaps this is the expanation.
I was browsing a British bike-gear website, and was initially puzzled by references to “apartment-mount brakes.” After a moment, I realized the site was using a localization tool to helpfully Americanize (Americanise?) its content for me, and it was supposed to be “flat-mount brakes.” I got a good laugh out of that.
Intercourse movies - classy British porn?
Instead of using a thesaurus, how about feeding the text into a Markov generator loaded up with some Lovecraft?
People don’t choose these sites; they’re sent there because they’re using a lot of SEO-manipulation on search engines.
So people aren’t getting anything of value, but they are deprived of their time judging if something’s legitimate or a bot.
Makes this seem coherent:
I don’t remember that part of the Bible
I remember when a British tabloid was covering the news coming out about the amazing tool-making capabilities of a certain population of crows, and they went somewhat thesaurus-happy. The problem was, they started off by leaving out a crucial letter. So the final story had all sorts of variations of sentences describing amazing “tool-using heifers” and “brainy bovines.”
I admit to being disappointed bouncebounce.org isn’t real. I can’t actually find an off-the-shelf “Florida Male” plagiarism library, so am thinking of writing “floridamale.js” and making available a plagiarism API
Obviously this is humor, but I think we should implement known scumware in open-source githubbable form just as a matter of good practice, to make the patterns and mentality involved viscerally at hand for study and awareness.
A team did this today with GPT-2, the fake-news-generating language learning dataset being kept proprietary supposedly because it was “too dangerous to reveal”. It was more likely being kept secret to shake down social media companies with or perform some other protection-racket shenanigan, but now they can’t do that, because an open implementation is in the wild.
I understand it’s the process that was used to generate “The Sword of Shannara”
Are you sure? I thought what it generated was “Muh Muh Muh My Sharona.”