That’s the gist of it. System dependencies have dependencies and so on.
I’m on a site reliability engineering team at my company that ‘keeps the lights on’ for our web based services. Monitoring, preparation, system efficiency improvements, escalation chains, etc only work as well as they are designed and followed-through. There are still surprises and combinations that hadn’t been considered until they fail.
I hope they regroup and bulletproof their system soon.
Yeah, and folks forget what they’re really going through. I remember a few years ago when Google Analytics took a dump. Half the freaking internet became unreachable because everybody uses GA for their clickthrough billing. Whoops.