That is one heck of an entertaining article. Gonna have to save this one for later…
… if you continue to
try { thisBullshit(); }
you are going tocatch (theseHands)
That is one heck of an entertaining article. Gonna have to save this one for later…
… if you continue to
try { thisBullshit(); }
you are going tocatch (theseHands)
I figure this is as good a place as any to ask (if only because I already went thru the thread assuming I’d find out). Are we to understand that some of the recent BB authors (“Natalie Dressed”) are AIs? A few of you are posting links to all of their BB articles. Did this AI authorship actually happen, or am I missing a joke, or both?
On an AI-related note, is there any kind of code/phrase I could embed in a given web page that would cause any AI that is scraping the site to (literally or figuratively; either’s fine) asphyxiate on its own flatulence? I tried a web search on embed code to repel AI
and all I got was how & why to embed AI in my website. There used to be ways (like robots.txt) to say “ignore me” but they implied some kind of functioning honor system. I’d like to make any unwelcome visitor go all Uniblab over on the other side of The Cloud.
How does his “safe superintelligence” feel about glue on pizza?
Unlike Altman, Sutskever does have a doctorate and work in the field, but it’s probably going to be another chatbot.
Hm. facebookexternalhit is currently crawling my site. No sign that it’s checking robots.txt, and I can’t be arsed to see if it’s really Facebook. Time to wall it out.
I guess it’s a question for the reseller, but I’m wondering if/how to do this using resold Rackspace. (See above)
I run a toy site on a Raspberry Pi3, and I only have do this every few months, so I don’t know if this will help, but I’ll show my work.
It’s a LAMP stack, Linux, Apache (moving to nginx soon), MariaDB, and PHP.
The first step is to add them to robots.txt
# I can't think of why Facebook needs to know about my site. Probably an AI harvester.
User-agent: facebookexternalhit
Disallow: /
Since they don’t seem to check that, the next step is to 403 that string in Apache’s conf file, which even if yours uses Apache, my setup will be different. /etc/apache2/sites-available
RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC]
RewriteRule ^(.*)$ - [F,L]
Ha! I already had them blocked, but they changed their string slightly. Updated.
Reset the server so the changes take effect.
sudo service apache2 restart
And now they can eat 403 Forbidden.
After that, if they persist, and I get tired of wack-a-mole, I go to the firewall (ufw) and start blocking Facebook address ranges, assuming that’s where it’s coming from.
Why? Be evil.
301 Moved Permanently
Location: https://www.nsa.gov
Meh I’ve done that, also http://127.0.0.1
, but I don’t think the crawler or the NSA notices.
There’s always 4chan
Thanks for this – I’ve done this kind of thing for work, either on a load balancer or in IIS (and where I wouldn’t be allowed to have this kind of satisfying fun with it – in that context, it was e.g. just to redirect from an obsolete hostname to the current one – so it didn’t even occur to me to use it that way). My own website runs on LAMP, but I’m pretty sure I don’t have that kind of access to it (just the MySQL & the PHP). Just remembered that I can mess with .htaccess, but I’ll have to go back and see what else I can do.
In amongst the 957/1000 inaccurate inferences due to the model just making shit up, I legitimately want to know how they think they can tell the difference.
I do not have the impression that there is too little stuff that was slurped from 4chan and ended up as training data for the various LLMs out there already.
I have instructed my agent (who is sending the instruction down the chain), that all book contracts henceforth have to agree that cover art must be created by a human artist. Stock art use is acceptable, but that stock art must be human-created, not AI-generated. We will expect our contractual partners to exercise due diligence to make sure these conditions are met (by, as an example, using only stock art sites that note when art is AI-generated). I’ll note that Tor already has agreed to this. So this is no longer just a policy; it’s a hard contractual point.
I think @sqlrob & @DasKleineTeilchen had something to say on bullshit?
About that…