Free Republic
Browse · Search
News/Activism
Topics · Post Article

To: EarthResearcher333

You’re describing a situation where ANY chance of failure is completely and literally unacceptable.

However, it’s hard to imagine anything complicated with a literally zero probability of failure. A complex system with multiple redundancies can still fail.

In a complex system the odds of a single point failure of at least one of the points are always higher than the odds of failure at a specific point. A cascade effect can occur if something causing failure of one part can also cause failure of others. Multiple related failures, even if each is minor, can lead to failure of the entire system.

The gate tendons at Oroville are a potential example of this. If one tendon fails due to corrosion, the odds that other tendons will also fail due to corrosion go up. If one tendon fails the load on the others will rise, increasing the chance that one of them will also fail. Tendons that have already been weakened by corrosion are especially vulnerable to this effect.

Failure of enough tendons will cause gate failure. Failure of even a single gate at high flow rates might cause enough turbulence to cause the spillway to fail, potentially destroying the entire gate system. This could, in turn, lead to rapid “V” erosion and subsequent failure of the whole dam.

This is an example of a failure of one thing can increase the odds of other similar things failing, leading eventually to failure of the entire system.


4,409 posted on 10/26/2017 11:56:51 PM PDT by EternalHope (Something wicked this way comes. Be ready.)
[ Post Reply | Private Reply | To 4408 | View Replies ]


To: EternalHope
"The Blue Screen of Death"

HA! You would be good at chess...-or- an expert witness as you explain well… :-)

"You’re describing a situation where ANY chance of failure is completely and literally unacceptable."

Here is an example of an unacceptable "single point of failure": - Ever hear of "The Blue Screen of Death" (TBOD) in computers?.

This is a perfect example of a single point of failure that kills the entire system operation (a hardware fault) resulting in a Non Maskable Interrupt (NMI) and a "Blue Screen" computer display.

I'm sure many people remember these…..Great fear an anguish result from TBOD's.

With 4 Gigabytes of system DRAM memory, with a parity bit for each 16 bits (2 bytes) becomes 4,294,967,296 individual data bits with 268,435,456 more individual data bits for parity check. So any single failure of one of (4,563,402,752) four billion five hundred sixty three million four hundred two thousand seven fifty two individual DRAM cells will cause a TBOD - if the Central Processing Unit (via memory controller) reads a memory location with a faulty single bit.

When the U.S. Courts found in favor of Micron Semiconductor (Idaho USA) vs Japan and South Korea DRAM manufacturers (that were "dumping" cheaper price than actual R&D cost developed DRAM memory to corner this originally dominated U.S. manufacturer market) the court's ruling forced these Asian Companies to recoup R&D costs by limiting future R&D to DRAM product profits. What resulted was some Asian companies (name withheld) released new DRAM bigger-faster memory that was marginal and susceptible to single bit errors (loss of charge in bit "cell" due to an imbalance in the substrate bias) just to get early profits coming in.

This Asian manufacturer took advantage of the "CTRL-ALT-DEL" mentality that PC systems lock up every once and a while, so the consumer resets and starts over. So they dumped marginal product on the market to get profits until they could "fix" it in the next spin or "stepping" of the DRAM design. But in Massively Parallel Supercomputer Systems (such as Intel's Paragon System in the 90's) this was intolerable when your system is designed for Parity only data integrity checking on memory. I would get calls from inside development engineers (from Intel) asking about these elusive and frustrating DRAM memory errors. After many months, and with very expensive Tek Logic Analyzers, I had already identified** the extremely complex interaction(s) that caused the loss of bit charge from the imbalanced substrate bias. I worked in R&D at another Massively Parallel Supercomputer company that used ECC instead of Parity (Error Check and Correct).

Computers eventually migrated to ECC memory today as this failure was intolerable. Any minor defect in billions of cells could render the computer useless (loss of computation and/or results) at any moment. ECC memory is a backup "redundancy" that (1) allows for a minor manufacturing (process QC) defect to exist but not bring the system down (2) allows for longevity in safe margin performance even when "wear out" approaches in the end of the life time bathtub curve.

---- Oroville Dam is like Parity Error & "The Blue Screen Of Death" (i.e. single points of failure)

Due to historic design decisions that eliminated the redundant "Delta Shaped" Headworks structure, DWR's choice to eliminate this increasingly costly approach forced the current design into a "single point of failure" Dam. How? DWR chose not to armor the Emergency Spillway hillside, even though it is rated at an enormous 350,000 cfs flow capacity. It failed at 3%. The Main Spillway was rated at 296,000 cfs. It failed at 18% after so many uses that progressively damaged itself.

When the "single point of failure" of the Main Spillway occurred, the Emergency Spillway should have operated as a "redundant" backup. It did not. IF DWR had done the proper research, they would have known that the Emergency Spillway would have failed miserably from the blocky type hydraulic turbulence effects in swift erosion of the highly weathered rock. They then would have armored the Emergency Spillway (similar to what is being done now - or more so back then: the full armoring of the hillside).

The Main Dam is a "single point of failure" in that there is no high volume flow path to empty the reservoir if an unexpected anomaly (leak) is detected in the dam. In essence, there is no way to diffuse the "bomb". The Main Spillway can only release water down to ~830 ft MSL so as to not turbulently erode the inlet apron. So in an emergency condition (leak in the dam), the max outflow of the power house plus the river outlet won't be able to draw down the lake fast enough. This issue has been brought up as a long term problem to address as noted in DSOD reports. So DWR recognizes this problem. Yet they are stuck in these "single point of failure" conditions.

That is why they must make D*A*M sure that there is no "through the dam" leak in the Dam. The "rainfall only" recent report was more for Public Relations. DWR (Joel Ledesma) recently stated that they will be looking at the green wet area the first of the year at recent Butte County Meeting a few days ago. If this is the case, then why did they publish the "rainfall only" report? Joel (DWR/SWP director) stated that this will be re-visited.*

Your analysis of the ripple effect in stress transfer to adjacent anchor tendons is accurate. All of the components to the Dam and the complex can be analyzed appropriately. The use of "single point of failure" has duality. Unacceptable composition by the steel manufacturer and/or QC of the high strength steel used for the Anchor Tendons can be considered an unacceptable condition akin to using poor quality bolts on a bridge (i.e. the bolts are a "single point" usage in design that is a failure source mechanism to the overall bridge's Factor of Safety). The failure analysis just carries out from the implications from this "single point". There are a myriad of "single points" in a design that carry out in their effects. My point was to not use "probability" as a method of partial acceptance of poor quality conditions. All components must be treated as KNOWN in their quality and KNOWN in their performance over time.

*Butte County Supervisor tells DWR it needs to restore trust with Oroville

http://www.krcrtv.com/news/butte-county-supervisor-tells-dwr-it-needs-to-restore-trust-with-oroville/644622156

--- Article clip: (emphasis mine)

In response, Butte County Chairman Bill Connelly made it known that Oroville has some major trust issues with the Department of Water Resources. He fired concerns at DWR's Deputy Director of the State Water Project, Joel Ledesma.

"I'm just personally leery of a situation where the same people who told us everything was okay are doing the inspections now," Conelly said to Ledesma in the meeting.

The vocal board member has been pushing for an independent team to oversee the repairs and asked Ledesma what will be done about allegations of cracks near the gates of the spillway.

"The gates are cracked, one crack is 15 feet long," said Connelly. "They leak, there are problems, they need to be addressed. [Ledesma] promised that it would be addressed in the near future, and that they would look at the river valve and the green spot on the dam in the near future. But again, that's somewhat hollow when it comes from DWR. I believe we need an outside, independent forensics analysis of the entire dam." --- end clip more at url link.

**Our Massively Parallel Supercomputer system was showing an "out of spec" soft error rate that didn't match the specs in the datasheet, even though the computer would safely continue as the ECC performed the rare bit failure correction on-the-fly.

The feared "Blue Screen of Death" - single point of failure hardware malfunction in computers (Parity Error)


Intel iPSC/2 16-node Supercomputer - 1995



4,410 posted on 10/27/2017 3:47:11 AM PDT by EarthResearcher333
[ Post Reply | Private Reply | To 4409 | View Replies ]

Free Republic
Browse · Search
News/Activism
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson