The internet's other archive

Because the Show Full Post button is still buggy after a decade, here’s the opening to the linked article that doesn’t show up here on the BBS.

Nuntakarn / Shutterstock

The website archive.today (also accessible at archive.is) has become a backbone of the internet, providing on-demand archives of specific pages and access to paywalled sites that play the SEO game of revealing the full text of articles to non-human viewers. It’s now a common Wikipedia citation! Jani Patokallio wrote up a history of the site and its unique position in the web’s ecology.

The archive runs Apache Hadoop and Apache Accumulo. All data is stored on HDFS, textual content is duplicated 3 times among servers in 2 datacenters and images are duplicated 2 times. Both datacenters are in Europe, with OVH hosting at least one of them.