You can call me AI

anon53189944 · June 19, 2024, 2:26pm

That is one heck of an entertaining article. Gonna have to save this one for later…

… if you continue to try { thisBullshit(); } you are going to catch (theseHands)

Gyrofrog · June 20, 2024, 4:08am

I figure this is as good a place as any to ask (if only because I already went thru the thread assuming I’d find out). Are we to understand that some of the recent BB authors (“Natalie Dressed”) are AIs? A few of you are posting links to all of their BB articles. Did this AI authorship actually happen, or am I missing a joke, or both?

On an AI-related note, is there any kind of code/phrase I could embed in a given web page that would cause any AI that is scraping the site to (literally or figuratively; either’s fine) asphyxiate on its own flatulence? I tried a web search on embed code to repel AI and all I got was how & why to embed AI in my website. There used to be ways (like robots.txt) to say “ignore me” but they implied some kind of functioning honor system. I’d like to make any unwelcome visitor go all Uniblab over on the other side of The Cloud.

FGD135 · June 20, 2024, 11:01am

RickMycroft · June 20, 2024, 11:16am

How does his “safe superintelligence” feel about glue on pizza?

Unlike Altman, Sutskever does have a doctorate and work in the field, but it’s probably going to be another chatbot.

sqlrob · June 20, 2024, 4:25pm

anon15383236 · June 20, 2024, 4:36pm

Perplexity

Apparently joining the Shithouse Gang.

RickMycroft · June 20, 2024, 11:34pm

Hm. facebookexternalhit is currently crawling my site. No sign that it’s checking robots.txt, and I can’t be arsed to see if it’s really Facebook. Time to wall it out.

Gyrofrog · June 20, 2024, 11:41pm

I guess it’s a question for the reseller, but I’m wondering if/how to do this using resold Rackspace. (See above)

RickMycroft · June 21, 2024, 1:16am

I run a toy site on a Raspberry Pi3, and I only have do this every few months, so I don’t know if this will help, but I’ll show my work.

It’s a LAMP stack, Linux, Apache (moving to nginx soon), MariaDB, and PHP.

The first step is to add them to robots.txt

# I can't think of why Facebook needs to know about my site. Probably an AI harvester.
User-agent: facebookexternalhit
Disallow: /

Since they don’t seem to check that, the next step is to 403 that string in Apache’s conf file, which even if yours uses Apache, my setup will be different. /etc/apache2/sites-available

                RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC]
                RewriteRule ^(.*)$ - [F,L]

Ha! I already had them blocked, but they changed their string slightly. Updated.

Reset the server so the changes take effect.

sudo service apache2 restart

And now they can eat 403 Forbidden.

After that, if they persist, and I get tired of wack-a-mole, I go to the firewall (ufw) and start blocking Facebook address ranges, assuming that’s where it’s coming from.

sqlrob · June 21, 2024, 1:22am

Why? Be evil.

301 Moved Permanently
Location: https://www.nsa.gov

RickMycroft · June 21, 2024, 1:30am

Meh I’ve done that, also http://127.0.0.1, but I don’t think the crawler or the NSA notices.

sqlrob · June 21, 2024, 1:46am

There’s always 4chan

smulder · June 21, 2024, 2:01am

Gyrofrog · June 21, 2024, 2:21am

Thanks for this – I’ve done this kind of thing for work, either on a load balancer or in IIS (and where I wouldn’t be allowed to have this kind of satisfying fun with it – in that context, it was e.g. just to redirect from an obsolete hostname to the current one – so it didn’t even occur to me to use it that way). My own website runs on LAMP, but I’m pretty sure I don’t have that kind of access to it (just the MySQL & the PHP). Just remembered that I can mess with .htaccess, but I’ll have to go back and see what else I can do.

sqlrob · June 21, 2024, 3:24am

catsidhe · June 21, 2024, 4:07am

In amongst the 957/1000 inaccurate inferences due to the model just making shit up, I legitimately want to know how they think they can tell the difference.

FGD135 · June 21, 2024, 4:29am

I do not have the impression that there is too little stuff that was slurped from 4chan and ended up as training data for the various LLMs out there already.

vermes82 · June 21, 2024, 12:25pm

anon15383236 · June 21, 2024, 6:36pm

I have instructed my agent (who is sending the instruction down the chain), that all book contracts henceforth have to agree that cover art must be created by a human artist. Stock art use is acceptable, but that stock art must be human-created, not AI-generated. We will expect our contractual partners to exercise due diligence to make sure these conditions are met (by, as an example, using only stock art sites that note when art is AI-generated). I’ll note that Tor already has agreed to this. So this is no longer just a policy; it’s a hard contractual point.

LutherBlisset · June 21, 2024, 8:57pm

I think @sqlrob & @DasKleineTeilchen had something to say on bullshit?

About that…

Topic		Replies	Views
Nightshade: a new tool artists can use to "poison" AI models that scrape their online work boing	108	2490	January 27, 2024
Artists sue developers of Midjourney and Stable Diffusion, claiming copyright infringement boing	175	3226	January 22, 2023
Illustrator discovers her art was used to train an AI art generator boing	40	3617	November 8, 2022
Federal judge says AI-generated artwork can't be copyrighted, because of monkeys boing	57	1351	August 27, 2023
Winner of a prestigious literary award unabashedly used AI to write it boing	65	1445	January 24, 2024

Related topics