I didn’t read past “the sleepy world of copyright”.
Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said
off yeah, A youtuber friend of mine posted this on mastodon and I made some tests. It seems to struggle a lot when there is noise or the phrases are short, or the accents are not good (so, whenever the translation confidence drops) and then it tries to “guess” what it said by hallucinating it.
Which sometimes can be a good thing (ie: when doing an extremely zoomed in photograph and everything is superpixelated but you don’t care about fidelity), BUT NOT HERE DAMMIT! Transcription is supposed to be accurate or NOT BE!
Not even in a picture. I was delighted with Ted Chiang’s essay (the fuzzy jpeg of the internet one) as he started with an example I have noted myself from copying architectural plans: that fuzzy numbers don’t get copied in an analogue fuzz and remain unreadable but instead are sharpened up digitally. Into the wrong numbers, a phenomenon I have witnessed for quite a long time. Sure it’s funny that Apple draws a picture of a completely different dog when I zoom too far but accurate reproduction of graphical data and illustrations is of more importance to science than the development of movable type printing.
LLMs introduce a whole other level of fuckery to it of course. During the summer I got one to summarise a conversation I had online where I told someone I’d send them an ebook. The LLm summarised it as a paper book, with a different but very plausible title (but one that didn’t exist) and an author whose name was absent from Google Scholar etc. WorldCat, LoC authority, despite sounding quite common.
I seem to have tripped one of the tokens that are meant to “no plagiarism, no Disney etc.” the output to avoid legal issues. So it was all garbage. Just garbage.
I learned about that here!
Yikes! That’s a failure mode that I didn’t expect. The protections against spewing (some) copyrighted material combined with a directive to prefer confident tone means that the LLM chatbots will have an ever-growing list of topics about which they’re encouraged to lie
Mine was a Canon so it’s not just Xerox.
I don’t have it to hand but there’s a screengrab that went around earlier this year of system tokens for ChatGPT and it’s full of no Disney, no Nazi etc.
Yeah, it has to do with the compression algorithm used. It’s smart/predictive, so…yeah, a precursor to the types of problems we’re seeing in LLM chatbots. With noisy input and a required confident output, it makes crap up. I had a better link about the algorithm…but that was all over a decade ago and the few links I’ve dredged up are broken now.
Yeah, but for me that’s totally fine on mobile pictures; we all know phone sensors are shit and rely on good sofware optimization to get a picture of reasonable quality.
And that’s why I still use a Powershot G9 when I travel around
On the other hand, this transcription software is a product that was sold as accurate, and whom people was relying to get transcriptions.
I mean, is bad and infuriating when your phone hallucinates numbers on a photo. Is threat inducing when your x-ray machine generate cancer cells out of nowhere.
Threat when it makes up numbers on plans also.
Can kill lots of people.
I hope that he’s successful in having his wishes respected but if someday after he passes his heirs or the people who control his estate decide to cash in on his image, I’m not sure if they’re legally required to do what he clearly wants them to do. Laws like California’s “Right of Publicity” statute describes post-mortem Publicity rights as being “freely transferable and descendible property rights.” So in his will he’d better be sure to leave his publicity rights to someone he trusts, I guess.