Posted on 10/06/2021 10:17:16 AM PDT by Red Badger
FACEBOOK, Instagram and WhatsApp went down for seven hours in a major global outage yesterday – but why?
Here's a quick guide on what actually happened to take three of the world's biggest websites and apps offline.
https://www.thesun.co.uk/tech/16328655/facebook-instagram-whatsapp-crashed/
In 2021, it's easy to wonder how huge chunks of the internet can go offline.
But it's surprisingly easy for outages to happen – due to how the internet works.
World wide web Every website – including Facebook – exists on a computer server somewhere.
So when you want to log on to Facebook.com, you have to connect to one of Facebook's computers.
Every website has an IP address, which you can type into your web browser – if you want.
But we prefer using domain names like Facebook.com because they're easier to remember.
That's where a DNS (Domain Name System) comes in.
When you type Facebook.com into your browser, the DNS effectively matches that URL to an IP address – allowing your computer to connect to Facebook.
Outages are often caused by DNS issues – but Facebook's problems go even deeper.
When you typed in Facebook.com, it was as if Facebook didn't exist.
There was nowhere for the DNS to send you.
Surfing the web DNS servers are often thought of as being like a phonebook for the internet.
But there's another system called BGP (Border Gateway Protocol).
BGP is like a Sat Nav that guides your data across the internet – to any address, courtesy of the DNS "phonebook".
So when you send packets of data to Facebook, BGP will guide it on the most efficient route – or another way, if that's unavailable.
It's like Google Maps, telling your data to take a left here and a right there.
This is how all the different networks that make up the internet can easily be navigated – with no effort from you.
There are lots of BGP systems – Facebook even built its own, to make life easier.
Facebook fail So what went wrong?
Facebook effectively told BGP – the maps of the internet – to remove itself from those maps.
This was obviously a mistake. A big one.
No DNS or BGP – and no user, ultimately – could find Facebook.
Imagine you had the address for a house, but no maps listed it.
Even though the house exists, if it doesn't appear on a map, you can't find it.
And because Facebook hosts WhatsApp, Instagram, Workplace and more, these websites couldn't be "found" either.
Everything that works through Facebook – including "Log In With Facebook" – was effectively shut down.
We even saw reports of Facebook employees being unable to get through security doors – and even speak to each other online.
This is a nightmare when the internet has lost you, and you need it to find you again.
Facebook runs most of its own systems, and Facebook was down.
So Facebook bosses needed engineers to physically get down to sites to repair the issue – and let the internet find Facebook again.
What actually went wrong? So we know that Facebook told the internet it didn't exist any more.
But why?
The details are still light, but Facebook was doing an update – and did it wrong.
"Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication," Facebook's Santosh Janardhan said.
"This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt."
Basically, Facebook's own routers – which connect Facebook servers to the internet – were configured wrongly.
So rather than a cyber-attack, overloaded servers or physical damage, it was simply a dodgy update.
And in seconds, it took billions of users offline for hours.
Too
Well....I’ve heard that it cost Zuck $6 billion. Kind of an expense exercise, even if it was all on paper.
It very well may have been the middle of the night when this “update” happened.
facebook is a global company, and you can bet they use as much cheap asian labor as they can hire.
.
Absolutely.
No. This is too easy to fix.
I am a network administrator who is primarily concerned with DNS stuff these days. It is hard to have sympathy for anyone who works for a corporation as evil as fakebook, but I can pretty much tell you that the poor bastards who run their DNS servers were probably being blamed completely for this for about 2 hours of the outage before they were able to PROVE that it wasn’t DNS and had people looking seriously elsewhere.
What's an alternative reason they could have 'missed' this - you know - other than incompetence. Is there a rational reason?
I’ve have never been involved with any situation that shutdown the entire company, but I have been involved in numerous IT situations where the impact was quite large....
The actual engineers probably knew what the issue was and how to fix it, but the managers were busy trying pin blame on someone besides themselves or tried to come up with excuses that the issue could not have been foreseen
So Facebook had paralysis by analysis....instead of just fixing the issue and then figuring out what went wrong, they had meeting after meeting tying to shift blame and not taking responsibility..this is typical of most large corporations when no one person wants to make a decision that might come back on them, they all want consensus of the final decision...
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.