Do dogs see that and think it falls into the uncanny valley?
(For that matter, my dog seems to think all kinds of things are uncanny-valley-dogs, like the wire reindeer people put our for Christmas. They make her crazy.)
In AI development, the dominant paradigm is that the more training data, the better. OpenAIâs GPT-2 model had a data set consisting of 40 gigabytes of text. GPT-3, which ChatGPT is based on, was trained on 570 GB of data. OpenAI has not shared how big the data set for its latest model, GPT-4, is.
But that hunger for larger models is now coming back to bite the company. In the past few weeks, several Western data protection authorities have started investigations into how OpenAI collects and processes the data powering ChatGPT. They believe it has scraped peopleâs personal data, such as names or email addresses, and used it without their consent.
Sort of like Clearview AI, the creepy facial recognition company. They probably know each other.
Lost in translation
Following major reforms known as sote, at the beginning of this year, responsibility for healthcare services was shifted from 293 municipalities to 21 self-governing wellbeing services counties, along with the city of Helsinki.
The structural reform has led to language issues, reports Hufvudstadsbladet, including new managers who donât speak Swedish, Finlandâs other official language, in predominantly Swedish-speaking areas, particularly in the southwest.
The western Uusimaa wellbeing service county has struggled to translate its new material into Swedish, relying on machine translations that have resulted in some incomprehensible texts.
âFor us itâs important that our staff can work in their mother tongue,â Sanna Svahn, the county director of western Uusimaa, told the Swedish-language daily.
The wellbeing service county is now seeking translation assistance from the Swedish Cultural Foundation, a private organisation.
(Reuters reprint)
AI companies: âWe will scrape programming sites to make it look like our product can program!â
StackOverflow: âHang on! We want to be paid first.â
SO contributors have entered the chat: âWait a sec! Whereâs our cut?â
Given the quality and security bugs in a lot of the code I see there, you really, really donât want unsupervised training on SO.
See also: Microsoft sample code.
I know, right? Practically every code example there is slightly wrong. Perfect poison for AI training sets.
Stabilityâs Tuesday filing said the artists âfail to identify a single allegedly infringing output image, let alone one that is substantially similar to any of their copyrighted works.â Midjourneyâs motion said that the lawsuit also does not âidentify a single work by any plaintiffâ that it âsupposedly used as training data.â
For anyoneâs edification, this is how art forgers work. You donât forge any particular artwork by an artist. You create an original artwork in the style and mannerisms of said artist, and then claim it to be a recently discovered piece.
A Cat AI. We are all doomed
ChatGPT creates mostly insecure code, but wonât tell you unless you ask
ChatGPT, OpenAIâs large language model for chatbots, not only produces mostly insecure code but also fails to alert users to its inadequacies despite being capable of pointing out its shortcomings.
Amid the frenzy of academic interest in the possibilities and limitations of large language models, four researchers affiliated with Université du Québec, in Canada, have delved into the security of code generated by ChatGPT, the non-intelligent, text-regurgitating bot from OpenAI.
In a pre-press paper titled, âHow Secure is Code Generated by ChatGPT?â computer scientists RaphaĂ«l Khoury, Anderson Avila, Jacob Brunelle, and Baba Mamadou Camara answer the question with research that can be summarized as ânot very.â
[âŠ]
What does an ex-Pharma Bro do next? If itâs Shkreli, itâs an AI Dr bot
[âŠ]
On Thursday, Martin Shkreli, released last year from a seven-year stint in prison for securities fraud, announced the availability of Dr Gupta. The controversial entrepreneur described the project as âThe worldâs first physician chatbot,â in a tweet on Thursday.
[âŠ]
The Register also asked, âDo you have any concern people may get bad advice from the model and follow that advice, leading to harm?â
That question went unanswered. Dr Gupta does come with a warning that the bot is not providing actual medical advice.
[âŠ]
Interesting read.
Also, this had previously escaped my attention:
Shkreli, shortly after being let out of prison, last year launched Druglike, âa decentralized science (DeSci) drug discovery Web3 platformâ that the companyâs press release [PDF] insisted âis not a pharmaceutical company.â
After watching it fall apart in a few Chess games, Iâm convinced that once it goes beyond its âopening bookâ of a large number of code examples, it will also fall apart in programming that involves more than boilerplate code generation. Any large project, where it has to keep a context, will be hopeless.
That is the issue between these sort of general purpose AIs like GPT & Bard and the ones specifically trained to play chess. Going back to old ones like DeepBlue, IBM had limited power compared to today so they had to focus and gave them data originated from chess pros from over the course of decades and centuries. Ones like GPT get information from everywhere online including a ton of data from terrible players.
The language based models are really just very advanced autocomplete at the moment. From my experience with the Bard beta I noticed that firsthand. Giving completely wrong information about Destiny 2 weapons and what archetypes they belong in. Throwing in wrong information about platelets in the middle of factually accurate info (platelets do NOT clean plague from the walls of arteries of the heart). And it often times includes fan fiction stuff in questions about anime canon.
But what those AIs do well is picking up a general mood around a topic regardless if that mood is factually accurate which is both equal parts awesome and troubling. You need a turbo amount of media literacy for any sort of deep topical dives using AIs because of that at the moment. I wouldnât be shocked to hear that political pollsters are using AI in such manner.
No, itâs not just that they have samples from bad players. Then it would play a bad but legal game, lose, possibly conceding when its position was hopeless. The problem is that it canât play Chess.
It has such a poor grasp of the gameâs context, that eventually the autocomplete pulls a fuzzy answer for a game close to but not the same as the current game, where itâs illegal. Itâll even change the list of previous moves.
This might be a class of problem that GPT and friends canât handle, no matter how much the dataset is tweaked. (Wait for them to cheat and add a Chess plug-in to hide the problem.)
(Yeah I agree that they might sneak in a chess plugin)
von Neumann was right that chess has no hidden data, but I wonder how many grand masters would make pungent remarks about there being no bluffing, deception or trying to skull-fuck your opponent?
The problem with a chess cheat is that the class of problems that GPT canât handle with fuzzy answers is much larger than chess. Once thereâs a better understanding of the limitations of LLM, the hype is going to take a major hit, with yo-yoing stock prices, and techbro billions in play.
GPT=NFT 2023.
Lot of money to be lost, billions wasted that could have been used to do something useful. See also $100 billion and ten years wasted on self driving cars. Just to gamble that you can make some people unemployed and the world just a bit worse.
At least self-driving car research produces useful spin-offs like driver assists against dangerous lane-changes or backing over a kid in the driveway, even autonomous vehicles in controlled situations.
I doubt there will be any from NFTs.
LLM might produce useful assists that help rather than replace humans.
Hopefully there will be a Robot Hell for the con men who dress these up with hype and sizzle, and get people killed, either directly like Musk, or statistically like Sam Altman, all to play their money games.