Originally published at: https://boingboing.net/2024/06/17/student-busted-using-complicated-hidden-ai-system-to-cheat-on-university-entrance-exam.html
…
Contributed by Allan Rose Hill
He’ll probably get hired by somebody after that!
Banned from exams for 2 years isn’t a punishment to make many teenage hearts tremble.
Too complicated a cheating method? Probably should have just gone with a butt plug.
All I can think is that he went to elaborate lengths and exposed himself to consequences… just so he could get some “AI” hallucinations for answers. I suppose if the system had other entrance exams to draw from, it would do well, but otherwise it doesn’t seem worth it. If the questions are simple enough that an LLM could give useful answers, you’d not need to cheat.
There’s probably a queue of US AI companies waiting to hire the guy. After all, they thrive on misguided creativity and a disdain for the law.
Maybe for his criminal energy? The thing with “AI” is that it’s so easy to use that using it for anything doesn’t really qualify you for a technical job the way such stunts used to. You probably don’t need to write a single line of code to put together a system like his these days.
It is entirely possible that he just OCR’ed video into text and dumped it into an LLM and stood a pretty large chance of failing.
It is also possible that the OCR’ing part was AI powered, and that is a restricted domain where AI works pretty well. Then he had that text dumped into something more useful like a big web search engine?
It is also possible that the exam’s questions mostly were the kind of things an AI could answer. Like problems that tend not to contain math, and are the most common kind of questions involving those words get Ok answers. Well frequently get ok answers. The real problem is LLMs are awesome at generating confident realistic looking wrong answers. If you are smart enough to tell that the answers are wrong you are probably smart enough to come up with answers yourself without the LLM “helping”.
So ask about the goat/fox/cabbage/boat problem and it will give the most common answer for the most common variant, with a statistically significant chance of mixing in answers from other variants, and a lot chance of a bit of word salad tossed in. If the exams question was the typical goat/fox/cabbage/boat problem the LLM will get it right frequently enough that if every question is similar using it on a test likely gets in the high 90% right. If the question is a variant where you replace any of the actors involved with another that is named differently but acts the same, say a wolf rather then fox the LLM has a high chance of referencing a fox not a wolf. If you swap the explanation of what the fox and goat eat (“vegan fox will eat the cabbage but not the goat, termite goat if left alone with the boat will eat it”) the LLM will 98% of the time ignore the new rules and crank out something based on the traditional rules. A almost insignificant fraction of the time it will line up with some other bits of text in the training model with the same variant rules and get the right answer.
I’m not so sure “simple” is the important part that makes LLMs have a significant chance of success. It is how frequently that question appears in the training data and how infrequently a similar sounding one appears, and if the exam version has any unusual features. Simply swapping the names of the actors would not really make a question “complicated” but would very likely get the LLM to regurgitate the wrong answer.
Enough training data seems to have it right. However if you ask an LLM a question about the goat/wolf/boat it “invents” the cabbage and gives the correct answer for the question the exam didn’t ask. Unfortunately that exam question isn’t very useful because the answer is trivial.
(love the comic, I assume it is XKCD…apparently most training models don’t ingest xkcd, or the quantity of text there makes it not throw the overall model off much)
I’m always reminded of this.
I would like to se a system trained only on xkcd.
This topic was automatically closed after 5 days. New replies are no longer allowed.