Originally published at: http://boingboing.net/2016/11/29/the-internet-archive-is-puttin.html
…
If only we had millions of computing devices distributed around the world, each with the ability to share information via software that could withstand requests for a “torrent” of data, then we wouldn’t need such central repositories.
While wonderous in theory, in practice unless you can guarantee a quorum of individuals dedicated to keeping the swarm alive, you risk losing the data entirely.
and if you’re going to find a group of individuals around the world committed to keeping the data anyway, then why not have each of them mirror the whole thing? No need for torrents then.
Torrents are excellent at spreading the load of transmitting data around. Not so much for ensuring all of the data is available all the time.
If Trump exists in this new copy, I already consider it tainted.
What is the size of the archive? Why not spreading mirrors all over the globe? At least one mirror per continent should help…
Or that the information being torrented is actually correct. A distributed system where different nodes store different parts of the data is still open to a systematic attack of replacing good with bad, just as a financial system involving paper and metal tokens held by many people is open to forgery without a strong central authority constantly monitoring.
Have they fixed problem with retroactively applying the robots.txt rules of one site with a particular domain to all previous sites that used that domain yet?
Blockchain!
Last number I saw was 15PB, a year or two ago. And seems to increase about ~1PB per year. It ain’t peanuts.
Of course ain’t no peanuts but we are talking about 6, maybe 7 mirrors. Expensive but doable.
Canada looks much more stable than the US will be starting next year… But it wasn’t that long ago, this happened: http://www.vice.com/en_ca/read/the-harper-government-has-trashed-and-burned-environmental-books-and-documents
Might be wise to throw another in Switzerland, just in case.
This is partially true for torrents. If you have a known-good hash of the completed fileset, then torrents aren’t subject to block injection, because you have the hash the completed data should match. However, same problem - someone has to maintain those hashes. And if you have a trusted group willing to maintain the hashes, well, might as well just give them the whole dataset to maintain, too.
I don’t see why we trust the IA not to be editing the past.
Well, who’s the better arbiter of our past? Short of storing everything you’ve ever read or written, that is. It used to be books we trusted to this role, after all. And the publishers were either often wrong, or politically motivated to “sanitize” details.
Let’s start with the somewhat easier “verify facts” problem, then, when solved, figure out how to solve the “storing facts” problem.
Surveillance law is one thing, censoring the internet is a completely different thing. You seem to confuse the two. I do not think that Canada was ever accused of censoring the internet. This kind of errors decrease your credibility as a journalist.
Consider it a form of original pre post truth reportage.
Do you think they’re behind the Bernstein bear misspelling/correction?
Setting up a fundraising effort as Trumpophobia is brilliant here.
Isn’t that awful? I mean I kind of get it but on the other hand all too often I try to view an old long gone web site that has since been taken over by squatters and all the old content is gone forever thanks to a robots.txt.