4.99999999999999 Gb of trolling.


5 gigabytes is just for one month, the full dataset is 250 gigabytes (over 1 terabyte uncompressed). So, you know, make sure you’ve got a stiff hard-drive…


Ohhh, better get my popcorn and fake outrage face ready.

On a slightly more serious note, I do wonder how well something like all of reddit could represent the population (or specific parts of it) at large.

Ie. we’re bound to get lots of data on how many racist, homophobic, etc comments there are, or on the completely opposite ends how many people help out complete strangers, and all the things in between like how many bro-nies and never-nudes there actually are. How fair will it be to extrapolate those results to the real world, or parts thereof like north american young-adult males.


In fact, let me just note the following as well :
“Fuck This Shit I’m Out”


I was first excited I might be able to extract all my reddit comments until I realized I joined in August of 2007.



How about Boing Boing’s? :stuck_out_tongue:


Actually, given the number of comment resets due to backend changes, I’d really like that. Here and on io9.


That would actually make an awesome free corpus for testing NLP.


Waiting for big data analysis of misogynistic terminology. I want charts.


“…from October of 2007 until May of 2015 (complete month).”

So, then, you say, “from October of 2007 through May of 2015.”

Here’s to concise writing. (Probably not BOing’s fault, but whatevs.)


How could it be? This site is the very essence of precision and impartiality.


