Posted on 10/05/2021 5:52:37 AM PDT by ShadowAce
Yesterday’s Facebook outage – which took down Facebook Messenger, Instagram, and WhatsApp as well as the main service – resulted from a mistake by the company’s own network engineers.
The mistake led to all of Facebook’s services being inaccessible, with one analogy likening it to a failure in the “air traffic control” services for network traffic …
We reported yesterday on the massive failure.
It’s not just you: Facebook, Instagram, and WhatsApp are all currently down for users around the world. We’re seeing error messages on all three services across iOS applications as well as on the web. Users are being greeted with error messages such as: “Sorry, something went wrong,” “5xx Server Error,” and more.
The outage is affecting every Facebook-owned platform, according to data on Downdetector and Twitter. This includes Instagram, Facebook, WhatsApp, and Facebook Messenger […] While some Facebook, Instagram, and WhatsApp outages only affect certain geographic regions, the services are down worldwide today.
It gradually appeared that the problem might relate to DNS – the domain name servers that tell devices which IP addresses to use to access services – but it was unclear what exactly had happened, and whether this was an external hack, malicious action by an insider, or a catastrophic mistake.
Facebook has now admitted in a blog post that it was a mistake.
Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.
It took a long time to resolve the problem because the inaccessible systems included the servers and tools engineers would normally use to solve the problem remotely. Reports suggest that lower-level employees had to gain physical access to the data centers, and then rely on step-by-step instructions from more senior engineers in order to undo the mistake. Complicating this, the networks being unavailable meant that Facebook’s door access systems were also offline, physically preventing access.
We’ll doubtless get the full story in time, but the consensus view emerging is that the problem was some mix of domain name server (DNS) and border gateway protocol (BGP) configuration.
The best analogy I’ve seen is to think of network traffic as being like planes. Your device wants to fly to facebook.com. Your plane first needs to know the GPS coordinates of the destination airport, that is, the IP address it should connect to. It gets that information by asking a DNS, which tells it that facebook.com is located at (for example) 66.220.144.0.
But getting to the final destination – the actual server that can perform the task you want to do – relies on a kind of air traffic control system for network traffic, and that’s the BGP. The BGP tells your device which route to fly through the various servers en route to your final destination.
It appears that Facebook completely lost its BGP systems – so there was no way for Facebook to tell devices how to reach their destination. And that included Facebook’s own engineers reaching the systems they needed to undo the mistake.
If this were just people being unable to post cat videos for a few hours, that would be one thing (though, come on, what is life without cat videos?). But WhatsApp is effectively a critical piece of communications infrastructure in many countries, routinely used for communication between patients and doctors, for example, and used by many for payments.
The extended outage has drawn attention to how vulnerable the entire world is to failures of this nature.
For example, millions of people rely on Google DNS servers to reach every server on the planet. Imagine those servers going down for an extended period. That wouldn’t just affect consumers, it would disrupt commerce and critical infrastructure. Factory production, fleet transport, retail… the works.
The whole world is critically dependent on a relatively small number of servers, all of which could be taken offline by a mistake of the kind that happened here. A lot of thought needs to be put into how we prevent a far more significant internet outage in the future.
So it WASN’T putin’s henchmen hackers. Dang!
The world was a better place for 6 hours.
Imagine them being destroyed and not replaced, ever, in order to save human-ness in a world gone mad.
What if it turns out that the prophet for our times wasn't George Orwell, but Frank Herbert?
‘Mistake.’
Yep. Like Biden in the WH is just a, ‘mistake.’
I hope it was Friendly Fire. I HOPE it costs Zuckernerd BILLIONS. I hope it happens again. :)
“So it WASN’T putin’s henchmen hackers. Dang!”
Of course it was...you’ll see. LOL.
Too bad they did not have their Two-Factor Authentication turned on 🤣🤣🤣🤣
I am hoping facebook becomes a thing of the past. Most people will grab hold of something new but then grow tired of it. They did it with Myspace and AOL and those seem to have faded away. Hopefully people are becoming bored with facebook.
YAWN, who cares. I find it amusing that the whole damn world came to a standstill because no one could access FB. My God these people need to get a hobby or better yet, a job
Were it not for reading about this here on FR I would never have known it even happened.
If Xi Jinping's cyber warriors were not already aware of these vulnerabilities, they are now. How many of his agents are already employed in the IT departments of strategically critical industries? And I hear they work cheaper than Americans...
FaceBook has been (we believe) complicit in collusion with the commie government to spy and phish us all.
IF this explanation is true ... they DID have to go back to the beginning, unwinding a huge ball of yarn, to ... not find the glitch, but patch up all the glitch caused.
Too many additions to the house, imo.
A life
The fix causes the same disappointment as when a government shutdown ends.
“Facebook’s door access systems were also offline, physically preventing access.”
And we are going to let “advanced tech” to shove self driving cars, trucks, airliners, and digital currency on us?
There is a NO GO line to be drawn with these concepts.
Awwww
For 6 hours I wasn’t able to see what my High School classmates had for breakfast. It was crushing.
“self driving cars, trucks”
Miss a payment and you are driving a brick.
Got accustomed to quickly pressing ENTER at “Are you sure?”
I’m sure protection will be baked into this sort of transaction. In the linux world, see “visudo.”
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.