HuffingBoingBoing bot (BETA)

Continuing the discussion from Huffing Boing Boing:

This is very-in-progress.

huffingBoingBoing node.js source

6 Likes

This wouldn’t be one of yours would it?

http://bbs.boingboing.net/users/newnewcow/activity

@NewNewCow 1000 apologies if you are not a bot. Welcome to boingboing.
You make weird posts. :smile:

1 Like

hah, no.

The goal of the bot is not to actually post to BB-forums, as that would be a bit spammy.

Der Twitters, though? Nobody cares about that virtual ecosystem. It’s totally Erie (as in Lake).

ZOMG apologies to @NewNewCow

I got reading about spambots on meta.discourse and jumped to conclusions!

Among other things; that your bot would actually play huffing boingboing (in the topic), guess I shoulda read the technical specs. :blush:

some recent output:

Jack Parsons of rocket scientist occultist got sub-broadband ISP service, thanks to Ridley Scott

Ridley Scott of miniseries got sub-broadband ISP service, thanks to Jack Parsons

LSD of nohitter got sub-broadband ISP service, thanks to Dock Ellis

Why we love telcoms shenanigans versus subbroadband ISP service

Our Peripheral, a documentary about magic by William Gibson

Our William Gibson, a documentary about William Gibson by William Gibsonians [whoops, need to fix the replacement regex. thought I did.]

Putting your Peripheral in your William Gibson

Oh joy! William Gibson is a book!

Teenage Drug Courier of Teenage Drug Courier got sub-broadband ISP service, thanks to Teenage Drug Courier

The Teenage Drug Courier: Teenage Drug Courier Teenage Drug Courier Teenage Drug Courier

Circling the Teenage Drug Courier with the mid-20th century’s most brilliant Teenage Drug Courier

The story of Teenage Drug Courier’s “Teenage Drug Courier Teenage Drug Courier” and an amazing Teenage Drug Courier [confession: the algorithm removed the possessive, and I put it back manually. need to fix the algorithm.]

5 Likes

Those results are very promising! Obviously not everything can be gold, but I think these are pretty entertaining. I particularly like “Our William Gibson, a documentary about William Gibson by William Gibsonians” actually.

Does this output to an HBB twitter stream?

I’m excited!

It will output to an HBB twitter stream.

Once I’m a little more pleased with the results.

Which will be in the next couple of days, if not tonight.

1 Like

Hopefully tied up to der Twitters TONIGHT.

 

recent samples:

  • Furniture: What It Is, What It’s Not*

  • Eight Pope old’s incredible prize-winning God

  • Who costumes American Hallowe’en? Analysis of 316K tweets

  • Obamacare: The Original Trilogy**

  • Our SSN, a documentary about magic by license number

  • Painting with copyright bullies

  • Which evolution will stick up for you when the Big Bang knock?

  • Which Pope will stick up for you when the God knock?

 

* This has got to be one of the most banal headlines. But for me, its one of the most amusing; it’s like one of the dumbest eHow articles out there. All the more amusing for appearing on a compendium of wonderful things.

** Hillary shot first.

4 Likes

Account has been created @ https://twitter.com/BoingBoingHuffR

Auth is all set up (tried a new method, which required the installation of Ruby).

“All” that remains is to install it onto Heroku, and let 'er rip.

1 Like

Surprisingly funny results!

But now I feel like I’m being replaced by a computer. First I lose my job to a machine, now they’re even taking over my pastimes and hobbies!

2 Likes

The results are scattershot. Some are awesome! Some are crap.

I still have to wire up a url-minifier and shove in links to both originals. That could be optional, but is nicer for the ecosystem (and can explain the underlying joke).

That also reduces the size of the headline by a bit, too. haven’t tested for the 140-character mark, yet.

If you want to compete with the machines, National Novel Generati[ng|on] Month 2014 - NaNoGenMo is nearly upon us.

Not being a coder myself, is there some way to understand the algorithm - how it decides, what the selection parameters are, etc? Just curious.

NaNoGenMo is awesome - I hadn’t seen that before. Before you take that on though, I think the next step on the BBHuffr is actually generating the articles. There are some that I’d be quite interested in reading!

The bot scrapes the old-fashioned page: http://boingboing.net/page/1 which has 15 headlines on it.

It selects 2 at random.
It then picks a transformation method - split on punctuations (if both have colons), split on coordinating-conjunction (if both have them), swap parts-of-speech (nouns only, so far), or split at random location. There’s a 25% chance that two headlines with colons in them could use a different method, just for varieties sake. If one headline has a lot of nouns, and the other only a few, it will loop through the noun-list; which is why it sometimes gets those buffalo-buffalo sentences I like so much.

 

November is the perfect time to pick up coding if you want to try something like this. Python or NodeJs are easy places to start with lots of text-twiddling examples out there. Don’t let complexity get in the way – repeating “hello world” 25,000 times gets you to the 50K word goal pretty fast. Then you can start iterating.

Well I’ve tried writing a novel for a few Novembers now, and never “won”. Maybe this November is a good time to not be successful at coding a novel-generating engine instead!

No go tonight. something screwed up with heroku authentication. AAARGH

It should be working, now: https://twitter.com/BoingBoingHuffR

2 Likes

Wow, didn’t take long for it to get deeply philosophical

1 Like

I need to institute a char-limit check. This tweet just FAILED

WATCH: Things Fitting Perfectly Into Things has Things Fitting Perfectly Into Things with Things Fitting Perfectly Into Things for Things Fitting Perfectly Into Things in Things Fitting Perfectly Into Things on Things Fitting Perfectly Into Things

#notallbuffaloes

4 Likes
  1. If there is enough room, short URLs of the original posts will be added
  2. now posts only every 30 minutes. Should it be every hour?

TODO:

  1. Check for tweet <= 140 chars (w/o urls). if too big, do it again
  2. Pull some headlines from the past (so we’re not always digging into the most recent 15)
1 Like

Short URLs added if there is room.

AND NOW THEY ARE CORRECT (the second url always had the source-page prefix on it, so never worked.).