What is going on with FR server? It is getting very ridiculous.
I think someone's found an unpatched exploit on the platform, and is coming after it at random times to cause maximum disruptions.
They couldn’t do a remote reset and had to actually drive to the site.
The zorch got stuck in the boaky line and the liftong got all whacked out in the Q-factor.
I’m not the most knowledgeable systems guru around, but I do know that when FR goes down, the first thing I do is a traceroute from MY server to find out where the problem lies along the chain.
This morning the traceroute showed that the complete /23 block of IP addresses (209.157.63.0-209.157.64.255) was not accessible. Now, having had servers in server farms for a few years, I know that there are only a few IP addresses (typically 1 to 8) allocated to any particular client... more depending upon size and needs, but generally you’re not gonna get a /23 (512 discreet IP addresses) unless you’re THE server farm.
So everyone who is yelling and screaming about FR servers being down needs to shut the F### up unless they have more knowledge of the situation than the average server admin. When a complete /23 or /24 block is down, there are hundreds of servers that are inaccessible, not just FR’s. Did anyone think to run a traceroute or ping on the whole /23 or /24 CIDR block? Just ‘cuz the traceroute fails to one IP address doesn’t mean it’s the server at THAT IP address that’s the problem.
Because of this, no matter what kind of high end server box or multiple boxes, or software or spiffy routers you have, if a mainstream router or router(s) above the server farm level is/are down, **nobody** is going to get access to the severs behind those routers. Sorry, that’s just a fact of life.
The only solution would be to have multiple redundant servers at multiple geographic locations fed by multiple backbone connections from different bandwidth providers and all interconnected... not something that is economically feasible except for the largest of companies [viz. Google, Yahoo, Amazon, etc.]
I am continually amazed and humbled that the hardware *and* software of FR is up and running as much and as well as it is. For the smart@$$es who continually smear FR and say that they could do a better job, with less money, I can only say- “put your money where your alligator mouth is because it’s overloading your hummingbird @$$ bigtime!” If it’s so easy, why aren’t YOU doing it?
Again, I can’t see where this current downtime was FR’s, JR’s or JohnRob’s fault. Trying to reboot the server, remotely, can’t be done if the whole /23 network is not accessible. Manual cycling of the power switch won’t do any good either, because it’s the access on the network that is the problem, not a database corruption or OS hangup. Physically driving to the NOC won’t do any good if the NOC has lost all of it’s connectivity - for whatever reason.
Mainline routers going down can be from many various causes: physical/electrical non-functional (fried components), software corruption, or DOS by overload. If today’s and the previous downtime was from the whole /23 network being DDOS’d, the least of our problems is FR withdrawal.
So people who think they know all about networks, servers, server admin, routing and the like just because they can operate a Windoze computer by turning on the power button and fiddling with a mouse should STFU until someone who actually DOES know something posts that information (this is not referring to you, jv & C2K, but to others I have read).
Thanks again, JimRob and JohnRob for what you have given and keep giving us!! Knowing what little I do about server admin still keeps me humbled.
#8^D
just my unasked for $0.02