published at 11:08am on 08/10/11
And now on Paranoia Theatre, some thoughts that have been kicking around in my head after I was rudely interrupted by the news the other night that none of our Amazon EC2 servers were accessible on the Internet any more. Sure enough, I checked the site: down. I checked the Amazon status page which indicated confirmed “connectivity problems.” I sent an email to the team and told them to keep their pants on. And then I waited, because really, in a situation like this, with the infrastructure we have so reliant on Amazon, there really isn’t anything to do at that point than wait for them to fix their routers and get our boxes back online.
Now, I’ve said in the past that while “the cloud” in general can be volatile, I still don’t think that outages like this will cause any major panic. One of two things will happen: either Amazon will shape up and stop these kinds of problems, or those companies with the need and the means will do what they’ve always done in the pre-cloud days and build geographically redundant backups into their core infrastructure.
But as I was sitting there, watching my terminal futilely trying to ping my servers, I started thinking about the scenarios where the kind of outage, rather than being the result of poor planning or human error, was much more malicious, deliberate and permanent. With Amazon becoming a single point of failure for many companies, every time there is an outage, someone publishes a headline listing all of the better-known companies that are suffering because of it. But even beyond Amazon as a brittle piece of the Internet’s infrastructure, it seems to me that there haven’t been any large, public attacks on data centers with the intent to physically and irreparably damage a service, rather than just interrupt business (in the form of a DDOS or hacking). Hearing about hacking exploits has been almost commonplace, but the last time one of my servers went down due to the integrity of a physical system, it was because someone dug up all of the main and backup fiber outside of my data center with a backhoe while doing road construction. But if a single backhoe (or, for that matter, a woman scavenging spare copper wire) can cause that kind of damage, why haven’t we heard of more people intentionally trying to drive a truck into a data center with the intent to take a company (or many) completely offline?
It’s possible that this kind of thing just is not practical. From an impact perspective, there are really two kinds of targets: large ones where any impact that would be made would actually affect some kind of significant change in the world, and small ones that wouldn’t really cause enough damage to be worth it. Any company large enough to know better will already have those geographic redundancy plans in place to prevent the loss of a single data center from actually causing any sort of lasting damage. And since much cloud infrastructure is, by design, geographically redundant anyway (Amazon’s S3 storage for one) even the smaller companies can take advantage of this kind of protection without any additional work. All of this makes the venture of trying to physically interrupt the operations of a data center rather as a fairly low reward activity.
Additionally, it’s possible that there really is no motivation for this kind of thing in the first place. Most of the attacks we see on data these days involve corporate or political information and personal identity theft. It’s entirely possible that right now there is simply little benefit to be had from wiping out, say, all of Mastercard’s data centers in one fell swoop when you could, for a lot less effort and a lot lower risk, just steal credit card numbers.
But as more and more of our lives reside solely online, from the critical (banking, health) to the personal (photos, journals), we are going to have to give more thought to what it means for us individually, and to society as a whole, to have the possibility that it could all go away, and what we can do to mitigate that inevitability.