"Tortured" phrases used to fool plagiarism detectors now infest scientific papers

Allegedly! And I never received any such email, anyway!


So much of academic journaling is fraudulent to begin with that this doesn’t really matter that much. Given their overwhelming reliance on hyper complicated statistical gymnastics applied to lazily collected data, the only thing that you can genuinely rely on is that the “published” researchers job is safe for another couple years.

This is a widely repeated pattern that plagues automated defenses across all of society. Every “behavior detector” lives on borrowed time. The moment a detector is in place, adversaries begin to test it to learn where its edges are. They learn exactly what triggers it, and what kind of fuzz they can use to distort the artifacts to avoid triggering it. It’s taken place in the law since at least the time of the Xtian bible; and there hasn’t been a lot of progress made since.

The only good defenses are to deploy such detectors sparingly, and hide the direct output as much as possible (e.g. if you’re building a spam detector, don’t send a “bounce” message saying “this was spam”, or the adversaries will quickly learn what does or doesn’t trigger it). Unfortunately, by using common off the shelf solutions everyone ends up sharing the same ruleset, so once an adversary learns how to bypass it in one case, they apply that to all cases.

Using customized and constantly morphing solutions, you can solve the problem via the “bear chase” theory: you don’t have to outrun the bear, you just have to outrun the person you’re with. But that’s expensive, and it requires constant care and feeding. A big company may be able to afford to do that with their spam detectors; a small organization might not.


Check out the paper itself on

“underground creepy crawly province”


Exactly this. It even applies to any type of performance measurement. Once a measurement becomes a target (determining bonuses, etc.), it is no longer a useful measurement.


The difficulty of translating complex language to English seems to be the proximate cause of the plagiarism

Do you really find it plausible that a human would accidentally translate “signal to noise” to “flag to commotion”, or “random forest” to “irregular lush territory”? These look like mistakes caused by missing key understanding of the idea behind the metaphor, like a forest as a group of trees.


Is publication in a scientific journal a requirement for some sort of Indian employment? I’ve seen many papers that seem ripped off from the wrong parts of wikipedia.

Sometimes, “what am I doing wrong” is a legitimate question, and refusal to divulge any answer will drive humans into madness.


“Counterfeit consciousness” is such a good alternative to AI that I might just start using it.


It’s sad, but every single piece of information that helps you, the frustrated user, also helps an adversary that’s trying to defeat the system.

And lest you think this isn’t a big deal, or that the inconvenience to you isn’t worth the effort, there are entire companies designed specifically to commit these exact kinds of fraud. And they are making serious bank working on behalf of some of the worst, richest, and most powerful villains across the globe.

Yes, having seen amazing translations between English and German that long pre-date computerized translation.
It is amazing the confidence that a bilingual dictionary gives some people.


It’s still unclear to me, after reading the abstract, what the point of this kind of fraud is. Obviously it increases your raw number of publications, but it seems like it also means that even the most cursory check of your resumé would reveal obvious, career-destroying fraud, and presumably all your papers have zero citations.

Anyway, whatever the scam, I’m sure the Smaug-like hoarding of academic publishing behind high paywalls surely doesn’t help.

English, as she is spoke.


Imagine you teach high school science. The principal has announced that teachers who publish papers in “prestigious” academic journals are eligible for substantial raises. You don’t have time to write the paper, and in fact have not written such a paper before. However, financial pressures would make the income quite welcome. So, you hire a firm to write a paper and submit it to a journal-- perhaps a captive journal like Microprocessors and Microsystems. You get a raise, the principal gets a bonus for raising the educational reputation of the school (our teachers are so good that they contribute to cutting edge science) and so on down the road. As long as no one cares about anything more than juking the stats and getting paid, everything’s golden.


There is blatant academic fraud. The “authors” are not “translating complex language to English”. They are stealing someone else’s English-language paper and running it through synonym replacement so that the theft won’t be noticed by plagiarism scanners.
The text becomes incomprehensible gibberish in the process, but these are shite journals where the editors don’t care.

UPDATE: Check out some of the posts at PubPeer, where people report these engarbagised papers.

In many cases the reporters have managed to recover the original, underlying text, and tracked down the source of the plagiarism. When someone takes a single paper, garbles it, and publishes the garbled version as their own work, this is really quite fraudy.


Plagiarism detectors are such bullshit. I have had to re-write whole passages of perfectly fine sentences that I wrote 100% myself without even referencing anything else into what I would consider actual tortured language just to get a text into the allowed percentage range.

Because English is a second language to me, there is probably a bunch of phrases that I use that are cliche or ubiquitous without even noticing and that can drive up a plagiarism percentage. But then again there are papers in such broken English that it’s barely readable that probably escape plagiarism tests entirely solely as a function of their shitty writing.


There’s a similar effort with bogus “news” sites as well - scrape something from CNN or a wire service, run it through a couple of machine translators and a Markov chain, and put it up on a “local” news site that’s really in Maharashtra.

