I thought I’d start a thread to discuss the infrastructure at Boing Boing - current, future, and answer any questions about running a pretty high-profile blog. I’ve been threatening to do this for the Boingers as posts, but I never expected to find much of an audience that way. This is a better way to do so, IMHO.
For some background: I took over the hosting of Boing Boing on Halloween, 2003. I had already been hosting Cory’s personal site for some time before this, and the Boingers were in a pinch, with the site down for several days. I took my personal desktop, a white-box P233MMX down to my friend’s cage at the carrier hotel at 151 Front Street West in Toronto, and the rest is history.
As for my (personal) background - I’ve been a sysadmin since 1994, and managing large-scale infrastructure since 2001, where I designed the infrastructure for First Data’s first internet-enabled payment terminals (on Linux, which was a bit of a maverick move at the time - my boss made me put Solaris systems in front of them at first until they were actually proven to be faster). In the years since I’ve worked as the Director of Technology Operations for what was once the fourth largest online advertising network, as well as a stint as DTO for the Wikimedia Foundation. I’ve managed some big networks
Anyway, ask me your questions! I’m going to add two more posts below, one about the current BB infrastructure, and one about the new setup, that I’ve dubbed *BB 3.0", and keep this as a running journal of the transition.
The current Boing Boing runs on six servers, as follows:
3 web front-end servers handling dynamic calls to Wordpress
1 admin server handling the admin interface for Wordpress, and acting as origin for media.boingboing.net
2 master-master replicated MySQL 5.6 servers for the database
The servers are currently managed hosting boxes from Priority Colo in Toronto, who have been awesome partners of our for years now. All the servers run Red Hat Enterprise Linux 6,
Sync between the frontends and the admin server is handled through a mess of rsync scripts and manual updates scripts our devs use to keep things in lockstep. At the time I set this up there wasn’t any good software-based sync setup that didn’t risk being a single point of failure.
Boing Boing is fronted by Fastly as our CDN. They are seriously awesome and offer control over content right down to the Varnish VCL level. I’m a huge fan of their setup and design.
The current system uses Monit for daemon restarts and outage reporting, Munin for systems analytics and long-term trend analysis (including mysql), and a hodgepodge of shell scripts for a backup infrastructure (as well as Vaultpress for an offsite solution which I highly recommend).
HTTPS and load balancing is handled using F5 LTMs.
The new Boing Boing infrastructure is going to implement several new advancements that have come up since 2008, which is when I built the existing setup. Most of the gear was leftover from a migration contract I handled for Federated Media, where they paid me partially in equipment, which leads to some strange archetecture decisions (like 15K drives instead of SSDs!
This is still a work in progress!
Servers:
6 HP DL360p G7 servers, dual-hexa-core Xeons, 64GB RAM, 6 15k drives in RAID10, 1 hot spare, configured as follows:
2 web front-ends with 64GB RAM, 6 15k drives in RAID10, 1 hot spare
1 admin server (for SSH Agent Forwarding, as well as Wordpress admin interface for the Boingers)
1 “tools” server for monitoring, backup, misc admin functions
2 DB servers. Unlke the rest, they’re configured with 4 SSDs, and 128GB Ram.
Software:
We’ll be running Red Hat Enterprise Linux 7, including it’s pretty awesome support for docker-latest
Hey @orenwolf quick question, I’ve been running 5.5 for a long time now (security patches are applied of course). In your opinion if performance is acceptable, are there any reasons to move to 5.6+, or lessons learned?
Also, just as your opinion (not necessarily the opinion of BB), how was the move to discourse? That couldn’t have been easy.
So, first of all, I heartily recommend running Percona’s mysql builds - they’re FLOSS, and the Percona team is half the folks who split off from mysql AB when Oracle bought mysql. They’ve incorporated patches from google and facebook as well as their own tweaks, and they provide a ton of really useful tools to the community )including percona toolkit, which used to be called maatkit until they hired him!)
5.6 has some features that matter in big installations, and don’t really matter for moderate sized installs. Things like GTLDs so that if you have a cascading chain of servers in master-slave-master loops (oh the fun), you can track specific transactions down, as well as the PERFORMANCE_SCHEMA virtual database, which many tools make use of for statistical analysis of what’s going on in your database. Percona, on top of that, added NUMA handling that was a lifesaver if you have systems that are basically huge gobs of RAM just running mysql, but have two processors with their own RAM banks. the new BB mysql servers have 128GB of ram each and were annoying the crap out of me with swapping (!) in that situation until Google’s backported NUMA Patches were added. (Good news for you is, Percona even backported it to 5.5!).
So, if 5.5 is working for you, stick with it. BB 3.0 is actually going with Percona’s build of mysql 5.7 mainly so I can play with TokuDB and its online backup tools., but one of the awesome additions to mysql 5.7 is the presence of the JSON datatype, which was previously a bit reason people were maintaining both mysql and mongodb instances in the past.
Fastly, and the HTTPS offloading to the Big-IP takes a ton of load off. So much so that in fact for BB 3.0, I’m only building two web frontends instead of three (but more on that later!)
I’ve been thinking about going down the percona route. I just have this, I dunno, old school mentality: if your database can’t handle billions of records and reasonable joins with 4gb of RAM, something else is the problem.
Alternatively, 128gb of RAM costs as much as a pumpkin spice latte at Starbucks.
The latest master I set up was 32 with eight cores, and it runs white hot. It is the auditing I think that is killing me, and percona appears to have better tools for figuring out what exactly is eating everything (explain, top, slow query logs, hell even strace aren’t optimal), so I think I’ll look at percona this weekend.
I can tell you that there have been at least two instances in recent memory for me where installing the Percona release of 5.5 (5.6 wasn’t general release yet) literally saved the DB stack from crumbling under the transaction load. A big part of why is the Google and Facebook patches, 5.6 incorporates most of those changes into mainline, which is why the 5.6 percona release focuses on other things, but there’s a group of super smart folks. their performance blog is always stuffed with lots of amazing info, often from Peter Zaitsev himself. It’s great to see the high-performance-dev-turned-CEO still getting his hands dirty with code, which is another reason I think these guys are awesome.
That being said, though, I don’t want to take away from the other mysql dev spinoff either, the MariaDB folk are also awesome. It’s a great time to be a mysql DBA, really.
Updated the third post with information on BB 3.0! Current status is I’ve installed the servers and base OS, and begun the Ansible configs for the servers themselves. My goal is to have everything moved over by end of March, we’ll see how it goes.