Posted on 04/02/2020 9:41:53 AM PDT by dayglored
US air safety bods call it 'potentially catastrophic' if reboot directive not implemented
The US Federal Aviation Administration has ordered Boeing 787 operators to switch their aircraft off and on every 51 days to prevent what it called "several potentially catastrophic failure scenarios" including the crashing of onboard network switches.
The airworthiness directive, due to be enforced from later this month, orders airlines to power-cycle their B787s before the aircraft reaches the specified days of continuous power-on operation.
The power cycling is needed to prevent stale data from populating the aircraft's systems, a problem that has occurred on different 787 systems in the past.
According to the directive itself, if the aircraft is powered on for more than 51 days this can lead to "display of misleading data" to the pilots, with that data including airspeed, attitude, altitude and engine operating indications. On top of all that, the stall warning horn and overspeed horn also stop working.
This alarming-sounding situation comes about because, for reasons the directive did not go into, the 787's common core system (CCS) an Intel Wind River VxWorks realtime OS product, at heart stops filtering out stale data from key flight control displays. That stale data-monitoring function going down in turn "could lead to undetected or unannunciated loss of common data network (CDN) message age validation, combined with a CDN switch failure".
Solving the problem is simple: power the aircraft down completely before reaching 51 days. It is usual for commercial airliners to spend weeks or more continuously powered on as crews change at airports, or ground power is plugged in overnight while cleaners and maintainers do their thing.
The CDN is a Boeing avionics term for the 787's internal Ethernet-based network. It is built to a slightly more stringent aviation-specific standard than common-or-garden Ethernet, that standard being called ARINC 664. More about ARINC 664 can be read here.
Airline pilots were sanguine about the implications of the failures when El Reg asked a handful about the directive. One told us: "Loss of airspeed data combined with engine instrument malfunctions isn't unheard of," adding that there wasn't really enough information in the doc to decide whether or not the described failure would be truly catastrophic. Besides, he said, the backup speed and attitude instruments are for obvious reasons completely separate from the main displays.
Another mused that loss of engine indications would make it harder to adopt the fallback drill of setting a known pitch and engine power* setting that guarantees safe straight-and-level flight while the pilots consult checklists and manuals to find a fix.
A third commented, tongue firmly in cheek: "Anything like that with the aircraft is unhealthy!"
A previous software bug forced airlines to power down their 787s every 248 days for fear that electrical generators could shut down in flight.
Airbus suffers from similar issues with its A350, with a relatively recent but since-patched bug forcing power cycles every 149 hours.
Persistent or unfiltered stale data is a known 787 problem. In 2014 a Japan Airlines 787 caught fire because of the (entirely separate, and since fixed) lithium-ion battery problem. Investigators realised the black boxes had been recording false information, hampering their task, because they were falsely accepting stale old data as up-to-the-second real inputs.
More seriously, another 787 stale data problem in years gone by saw superseded backup flight plans persisting in standby navigation computers, and activating occasionally. Activation caused the autopilot to wrongly decide it was halfway through flying a previous journey and manoeuvre to regain the "correct" flight path. Another symptom was for the flight management system to simply go blank and freeze, triggered by selection of a standard arrival path (STAR) with exactly 14 waypoints such as the BIMPA 4U approach to Poland's rather busy Warsaw Airport. The Polish air safety regulator published this mildly alarming finding in 2016 [2-page PDF, in Polish].
This was fixed through a software update, as the US Federal Aviation Administration reiterated last year. In addition, Warsaw's BIMPA 4U approach has since been superseded.
The Register asked Boeing to comment. ®
Although I doubt the affected systems are Windows based, I'm gonna ping the WindowsPingList because... why not?
Just figured it was interesting.
“Although I doubt the affected systems are Windows based, I’m gonna ping the WindowsPingList because... why not?”
BSOD. Blue Skies of Death
Can’t they just give it a good kick on the side?
Ok, so Boeing is pushing out crap. But I bet their engineering staff is multicultural and diverse.
51 days is pretty low up-time for networking equipment. They must really be paranoid.
BSOD: Boeing Skies of Death?
It’s like they never even tested their planes.
Are you kidding me?? These software jockeys can’t purge the extraneous data automatically via maintenance cycles that are self initiated by the software/firmware? This is beyond belief.
sounds similar to the Y2K problem except it happens after every 51 days ... i.e., counter overflows and similar extremely bad coding ...
probably because the coding was done in India, China, Pakistan and the like, or because they brought Indians, Pakistanis and Chinese to the U.S. to do it ...
assuming they continue to refuse to use U.S. coders, they’d have been WAY better off to bring in some Russian or other Eastern European coders instead, who are some of the best coders in the world ...
” power cycling is needed to prevent stale data from populating the aircraft’s systems”
Wonder who did the programming. Sound like memory leaks cause by orphan pointers.
The program was most likely written in C++ for speed.
This is to occur no matter where the planes are.
“51 days is pretty low up-time for networking equipment. They must really be paranoid.”
from reading the article, it’s no paranoia but a known problem, in other words, REALLY REALLY REALLY bad programming ....
Yeah but this is ‘on aircraft’ network equipment. 51 days is a long time to keep an aircraft powered on. What this is saying is they can’t leave the same plane powered for more than 51 days at a stretch.
No kidding. I have datacenter network gear that stays up without a reboot for years. The only time it gets restarted is after applying a required security patch.
I’ve said this before.
Intel was a super company and run very efficiently and their chips were rigorously tested.... until the H1b’s hired around 2000 moved up the ranks to upper management.
There are some true morons now running Intel and Microsoft.
At least they forced out BK and Rene James as co-CEO.
It’s not a ‘bug’, it’s a FEATURE!.....................
On a plus note, the onboard entertainment system is unaffected.
Isn't that what -really- matters? Diversity Is Our Strength, right?
And I'll bet that when one of these planes hits the ground, the resulting pieces are, indeed, quite diverse.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.