Real computer users look at the keyboard-- not at the screen.
In the beginning was the script
But I never blamed the Hole Hawg; I blamed myself. The Hole Hawg is dangerous because it does exactly what you tell it to. It is not bound by the physical limitations that are inherent in a cheap drill, and neither is it limited by safety interlocks that might be built into a homeowner’s product by a liability-conscious manufacturer. The danger lies not in the machine itself but in the user’s failure to envision the full consequences of the instructions he gives to it.
rm three *
s3cmd del --recursive s3://*
‘Some people, when confronted with a problem, think “I know, I’ll use Amazon S3.” Now they have two problems.’
Could be worse. Could have been one of the many times a DNS typo brought down the entire Internet.
He’s probably a Chad.
For the last time, that was Jeff, not me!!
I worked for a company that made software to monitor funds and securities. We would run a series of tests on our servers after an upgrade before putting them back online.
Being new I didn’t realize there were test settings that had to be swapped in on the server before a run. I started the tests with the server not pointed to our dummy data services, nope, still pointed at live sources like Bloomberg.
A few scaling and load tests later. I ran up over $15,000 in data transaction fees and concerned emails started to come in from those sources clueing me in to the huge mistake I had just made.
This resulted in a massive panic attack in which I pondered many past mistakes leading up to this point, future career paths outside the software industry and my own mortality.
A phone call or two and some emails from my manager later the charges were reversed, fears were assuaged and I somehow still had a job.
So remember kids: get review before you commit.
Bezos is smart enough to know that giving people a huge incentive to be dishonest about what went wrong is a bad idea.
The Amazon root cause analysis process doesn’t even name the person who made the mistake. I doubt that person was fired because of one error. If it was part of a pattern, well…
Typo, yes it was a “typo”.
I have actually managed to recover systems where the root user accidentally typed “rm -rf /" instead of "rm -rf ./”.
One of the reasons I never underestimate the dangers of hitting the return key without first reading the command…
[edit] The trouble of recovering it after someone else screwed up being the reason, not that I did it. Just realized that could be read differently…
In light of this AWS S3 outage, we, CS researchers at Univ. of Chicago, recently published a paper about:
‘Why Does the Cloud Stop Computing’. You can find our paper and slides here:
Paper: http://ucare.cs.uchicago.edu/pdf/socc16-cos.pdf
Slide: http://ucare.cs.uchicago.edu/slides/socc16-cos.pptx
I have had a couple of users at my site destroy EC2 instances by chown -R / apparently trying to resolve permission problems. System was okay but I couldn’t get back in to make repairs.
I once corrupted a production db due to a typo, and then found out we had no backups.
I feel vindicated.
Every time a cloud provider approaches me pitching their service I ask for a copy of their DR plan, testing, and estimated downtime during a disaster - information which all of my clients demand from me before a contract is finalized. Oddly, not a single one has ever supplied one.
Did you know that an Oracle DB continues to process transactions just fine after you type “newfs” on it’s data partition? I didn’t know that, until I did it. Oops. Still, we moved the live traffic to another server ASAP. Sweatin’ bullets.
-jeff
You can tell he’s a serious computer operator by the turtleneck.
“Don’cha just know it?”