On a proper enterprise failover system, the moment the first system realizes it’s screwed or even acting strange, the load and connections get seamlessly or nigh seamlessly shifted to the next, identical system running off the same data from just before the corruption. If that one goes down, it goes to the next machine down the line, etc. They’re not even all running off the same file store, and if need be as part of the architecture involves file system snapshots every (say) 15 minutes, mirrored across multiple machines.
It’s expensive but it’s the current standard for enterprise critical apps.
I get how it works...
That’s fine for enterprise apps - not for things ‘safety critical’....which is, imo, more relevant for this application.