Where did Facebook go yesterday?

Is it on topic for my blog? Maybe not, but it has come up in conversation enough times already that I think people are interested about it. So here's a quick bunch of links for you, in case you're curious and would like to learn more.

As you may or may not have noticed (I'm guessing that you did notice) that Facebook went poof yesterday for a number of hours. Here's the overview from Brian Krebs.

The impact was broader than you might have realized. Sheera Frankel of the New York Times tweeted, "Was just on phone with someone who works for FB who described employees unable to enter buildings this morning to begin to evaluate extent of outage because their badges weren’t working to access doors." Oof.

What went wrong? Mitchell Clark of The Verge presciently joked about the gut instinct that “it’s always DNS or BGP."

And lo, it was indeed BGP. Celso Martinho and Tom Strickx from Cloudflare posted a detailed beakdown of what they saw.

And here's what Facebook has to say about it.

Errors in a BGP configuration can do nasty things. Remember when Pakistan accidentally knocked Youtube off the internet?

I honestly hope nobody gets fired over this. Improve systems, failsafes, checks and balances, when something like this happens. But don't lay the blame at the feet of an individual who might have typo'd a configuration entry. Instead, I hope they use it as a learning experience to improve processes so it takes more that one person's cut and paste to bring the whole thing down.

No comments:

Post a Comment

Comments policy: Al is always right. Kidding, mostly. Be polite, and you're welcome to join in, even if it's a differing viewpoint.