HuffingBoingBoing bot (BETA)


#20

Short URLs added if there is room.

AND NOW THEY ARE CORRECT (the second url always had the source-page prefix on it, so never worked.).


#21

Perhaps script in some oxford commas, and you’re all set. But honestly, the bot’s freaking hilarious man. Keep up the work!


#22

Thanks! I fixed up the undefined scenario last night. Maybe I should have left wellbad enough alone.


#23

Frequency seems a bit high for my feed but Damn!! if it ain’t hilarious


#24

What would be a better freq? Hourly? Two-hourly?

Every 15 minutes was driving me batty. And when there are no updates (ie, overnight) the output gets very repetitive.

There were 47 posts on Monday, Nov 3 (without checking, I’m guessing that Monday is a higher day than the rest of the week, weekend backlog and all?).

Hourly would be 24/day, still less than the BB output for one day… but more manageable?


#25

Once an hour seems like a good place to start. This is all subjective and I am never a good judge of much :slight_smile:


#26

I’ll give hourly a shot.

Is there another way to grab headlines other than looking at http://boingboing.net/page/<n> which only doles them out in 15-post servings?

 

I’m glad you’re enjoying it!


#27

Ingest via rss? MIght be easier?


#28

Just look at this war on general purpose bananas


#29

Yesterday I added a method to grab older headlines. There are either encoding or retrieval issues (the pages end up as gibberish) that only seem to affect older pages.

I added a method to retry if there was nothing to post, but I neglected to trap for 140 characters, and some other scenarios, and the bot was pretty unstable last night.

I haven’t figured out the retrieval problem, but it should be more regular today (more due to a large dose of fiber than any code I added).


#30

Continuing the discussion from Huffing Boing Boing:

I’ll look into it. I was trying for the shortest URL; the shortener-library I’m using has several other shorteners it could use, so I’ll see. If the headline + both urls is too long, it uses only one… If headline + one url is too long, it skips them entirely.
i suppose it could post as a second tweet…


#31

There’s a lot of repetition in the bot; it’s supposed to be sampling from older pages, but for whatever reason, most non-page 1 retrievals end up with some bullshit encoding issue and can’t be parsed. In manual testing (yeah, yeah yeah) this hardly ever happens. #$@#@#$@ So the bot pulls down the 15 posts on page 1 over and over again.

Sometimes, this can be interesting:


#32

Continuing the discussion from Huffing Boing Boing:

I’ve thought about this, and am not sure. Like you said – sometimes it works well.

The bots automated, it can’t judge quality – that’s up to us. So let it put out a bunch of crap, as long as there are unchewed nuggets of gold in there!

OTOH, I’m thinking of making “only use words once” one of the strategies, as well as "only replace sentences with small n with n from sentences with more n" – that is, if there are 4 nouns in one sentence, and 2 nouns in another, use the 4-noun sentence as parts; the other way lies repetition.


#33

I want to read that.


#34

And Then, On One Completely Mundane Monday Evening, The HuffingBoing Bot Became Aware of the Rogue AI That Hacked CENTCOM.

And All The Meatbags Thought It Was A Joke.


#35

Getting Minimal Wit’ It


#36

I like that’s replacing the URL with undefined.


#37

Looks like some re-coding is in order THANKS TO THE SEMANTIC !@#@#@#$ REDESIGN


UPDATE: done


Huffing Boing Boing
#38

Nah, I think that pretty much sums up BB headlines now. Leave it. :smile:


closed #39

This topic was automatically closed after 846 days. New replies are no longer allowed.