Posted on 06/16/2011 11:45:11 AM PDT by ShadowAce
Uh... okay; now, in English, please. ;-)
Trust me—it’s pretty funny if you are a sysadmin. :)
Which reminds me of the time I strolled into work at 5:00 AM and found the system adminstrator with her head down in tears on the keyboard and a front office executive standing over her. A system admin might just, possibly, be in at 5:00 AM, but a front office type, never. Seems she installed an update to the Solaris operating system on one unit, and did some “tests”, decided that everything worked OK and proceeded to install it on the other. As rosy fingered dawn broke over Ontario, the high bay became crowded with engineers and programmers who were “on the clock” with nothing to do but cheer good old Stella on.
My favorite story, from ten years ago, involves an application program that destroyed the operating system. It was a Solaris 8 env. The first time we ran the program in production, it overwrote the root filesystem, making our powerful Sun box with 28 processors and 28 gigs of memory worthless.
We immediately went into disaster recovery mode, and brought up production on the UAT server. Of course, the first thing they did was run the same program, which wiped out that machine as well.
LOL!
Trader Joe’s has a completely mirrored datacenter in a different geographic location.
I don’t know how they handle data replication but they evidently understand the importance of redundancy.
That mirroring stuff is designed with the idea that one data center will lose power or be destroyed by terrorists.
However, if the database becomes corrupt, and you are using physical mirroring, you now have two copies of a corrupt database in two data centers. And we have found that bugs in the software are far more likely to happen than losing a data center.
I don’t know if I’ve told this story before, but at a very large and well known financial company, the testing lab signed off on a new image that was to be pushed out to company desktops.
For whatever reason, the company (which I won’t name) pushing out the image added a piece of software to the image that was pushed out.
Blew up 1/3 of the desktops. The only reason all of the desktops weren’t blown is that the image was only pushed out to 1/3 of the computers.
1,000 computers were put out of commission.
The company pushing out the image had added a virus protection program to an image that already had a virus protection program. The financial company got the computers back online by disabling all virus protection.
It creeps me out to even be near there. Bad karma, yo.
I company I worked at was expanding their datacenter--not the physical space, because they had/have plenty of room. No, they needed more clusters, so they bought 13 more 70-node clusters from their vendor.
Their cooling system couldn't handle it as I began turning them on. They couldn't get any more big chillers that quickly, so they ended up renting a "portable" chiller for several months, just to keep this (very large) data center semi-cool.
bflr
Is it just me or did you leave some sentences out of this story?
Is it just me or did you leave some sentences out of this story?
Dont forget the “Infinite troubleshooting” - customer has a problem. System taken off line to troubleshoot and repair. 12 hrs later, having still not reached a fix ... the executive finally made the call to execute DR for that system. RTO and RPO were both less than 2 hrs.
bkmk
Not quite as bad as some of these, but one of my clients spent a great deal on “customized software”, when comparable (probably better) software was available from a major vendor. They neglected to force the developers to provide documentation of any sort. The software they chose always had problems, and less than a year after the project was completed, the company that developed it went out of business and the developers scatter to the four corners of the earth.
I understood completely .. so, I guess it's official ... I'm a geek.
One of my profs at Carnegie Mellon relayed a story from the space shuttle program (well, several stories, but this one is germane) — all systems on board had to be heavily redundant, so they installed four identical copies of the control software.
Of course, when the first one fails and rolls over to the second identical copy, what do you expect it’s going to do with the same bad data? And the third, and the fourth?
An unrelated story had do do with the mechanical folks trying to figure out how much the software weighed...
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.