Posted on 04/02/2020 9:41:53 AM PDT by dayglored
US air safety bods call it 'potentially catastrophic' if reboot directive not implemented
The US Federal Aviation Administration has ordered Boeing 787 operators to switch their aircraft off and on every 51 days to prevent what it called "several potentially catastrophic failure scenarios" including the crashing of onboard network switches.
The airworthiness directive, due to be enforced from later this month, orders airlines to power-cycle their B787s before the aircraft reaches the specified days of continuous power-on operation.
The power cycling is needed to prevent stale data from populating the aircraft's systems, a problem that has occurred on different 787 systems in the past.
According to the directive itself, if the aircraft is powered on for more than 51 days this can lead to "display of misleading data" to the pilots, with that data including airspeed, attitude, altitude and engine operating indications. On top of all that, the stall warning horn and overspeed horn also stop working.
This alarming-sounding situation comes about because, for reasons the directive did not go into, the 787's common core system (CCS) an Intel Wind River VxWorks realtime OS product, at heart stops filtering out stale data from key flight control displays. That stale data-monitoring function going down in turn "could lead to undetected or unannunciated loss of common data network (CDN) message age validation, combined with a CDN switch failure".
Solving the problem is simple: power the aircraft down completely before reaching 51 days. It is usual for commercial airliners to spend weeks or more continuously powered on as crews change at airports, or ground power is plugged in overnight while cleaners and maintainers do their thing.
The CDN is a Boeing avionics term for the 787's internal Ethernet-based network. It is built to a slightly more stringent aviation-specific standard than common-or-garden Ethernet, that standard being called ARINC 664. More about ARINC 664 can be read here.
Airline pilots were sanguine about the implications of the failures when El Reg asked a handful about the directive. One told us: "Loss of airspeed data combined with engine instrument malfunctions isn't unheard of," adding that there wasn't really enough information in the doc to decide whether or not the described failure would be truly catastrophic. Besides, he said, the backup speed and attitude instruments are for obvious reasons completely separate from the main displays.
Another mused that loss of engine indications would make it harder to adopt the fallback drill of setting a known pitch and engine power* setting that guarantees safe straight-and-level flight while the pilots consult checklists and manuals to find a fix.
A third commented, tongue firmly in cheek: "Anything like that with the aircraft is unhealthy!"
A previous software bug forced airlines to power down their 787s every 248 days for fear that electrical generators could shut down in flight.
Airbus suffers from similar issues with its A350, with a relatively recent but since-patched bug forcing power cycles every 149 hours.
Persistent or unfiltered stale data is a known 787 problem. In 2014 a Japan Airlines 787 caught fire because of the (entirely separate, and since fixed) lithium-ion battery problem. Investigators realised the black boxes had been recording false information, hampering their task, because they were falsely accepting stale old data as up-to-the-second real inputs.
More seriously, another 787 stale data problem in years gone by saw superseded backup flight plans persisting in standby navigation computers, and activating occasionally. Activation caused the autopilot to wrongly decide it was halfway through flying a previous journey and manoeuvre to regain the "correct" flight path. Another symptom was for the flight management system to simply go blank and freeze, triggered by selection of a standard arrival path (STAR) with exactly 14 waypoints such as the BIMPA 4U approach to Poland's rather busy Warsaw Airport. The Polish air safety regulator published this mildly alarming finding in 2016 [2-page PDF, in Polish].
This was fixed through a software update, as the US Federal Aviation Administration reiterated last year. In addition, Warsaw's BIMPA 4U approach has since been superseded.
The Register asked Boeing to comment. ®
No I can’t. It has been to long. Some of the places I read have news that never hits the MSM. MY guess now is that maybe it was an internal leak. Obviously a problem like this does not go from being discovered to being in the news the next day. And once a probable problem has been identified then some time would be taken to verify it before issuing any directive.
If it’s Boeing, I ain’t going.
Not all the computers in your car power down. I know, wrote automotive S/W for transfer case, differential control... Knew a guy that did the same on another platform. He wrote his code to perform a detailed bit on power cycles. No one told him the module was always powered. Long story short, module never failed until owner replaced the battery. This was typically after the 3 year warranty was expired. Lol. Working with idiots led by other idiots with no time for requirements or system knowledge. Module did have a very low warranty rate so that was a plus to management.
What if you cross the International Date Line?..................
wind river for the real time OS.
imaginably, boeing for the cockpit instrumentation applications, since the cockpit instruments are boeing specific.
both imaginably would have to run 51+ day stress tests to find and fix the problems before the systems are released into the field. 51+ day stress tests might be very expensive in money and time costs.
Your thoughts on this?
“wind river for the real time OS.”
Ok it is Linux. I will have to install a copy in Hyper-v and take a look. You can run on PowerPC and ARM.
At wind river site now. Looks interesting. Systems have a came a long way since I last work on Avionics.
> Thats not a very good operating system. An op system must be robust and not do stupid things.
I am no vxworks fan but vxworks is used in dozens if not hundreds of similar products, and apparently the 51 day failures are boeing 787 specific. if your conjecture is correct then would the problem be noticable in other products in which vxworks is used.
> I was making a software presentation to Boeing engineering once and one of their uh, multicultural software people asked me, “What’s an opcode?”
OMG
Thats the best time to do it. :)
> Ok it is Linux.
I am not claiming that. I have no inside knowledge of current vxworks, but my impression received over time has been that it is a proprietary RTOS which is not linux based and contains differences from linux, internal and external. According to wind river, it implements a POSIX PSE52 interface.
I drive a bus so I wouldn’t know - however, if a Boeing plane needs updates and the electronics isn’t turned on every so often then there’d be a problem; just like those upgrades that automatically download to your PC.
Every time battery gets replaced or disconnected, have to reprogram all the radio stations!
There you go again. Thinking outside the box. Going to get you in trouble.
There you go again. Thinking outside the box. Going to get you in trouble. Make the bean counters mad.
https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)
https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
> I drive a bus so I wouldnt know - however, if a Boeing plane needs updates and the electronics isnt turned on every so often then thered be a problem; just like those upgrades that automatically download to your PC.
Actually it would probably be IMHO a security violation to attach such a system to the open internet, since it would allow access to the system from outside the intended means of communication. More likely, a software upgrade would be a controlled manual process with explicit checklist style instructions that must be manually checked off in sequence. Automation in this situation might mean less control over the system since at any given moment in time, the system is in an unknown (either updated, out of date, or in between) state for each software subsystem in the system. There might be dozens, hundreds or thousands of software subsystems, each verified to work with each other (or not), but always accompanied by version designations of the associated software and hardware environment in which it is certified to operate.
Within each hardware and software subsystem, there normally is a process that governs how changes are made to a subsystem to produce a future version. Usually that process is some form of ISO 9000. It has a test component which regulates how subsystem tests are created and changed (generally, a procedure analagous to the manner in which the subsystems themselves are changed).
All of this is usually very time consuming and expensive. It becomes very tempting to take short cuts. Perhaps the problems with the boeing 737 max are the result of some short cuts that were taken to avoid massive testing requirements. These short cuts were apparently taken during a recent period at which boeing stock was at (then) all time high prices, which possibly makes it so much the more embarrassing for boeing, especially at a time when they are looking for government handouts to prevent massive layoffs due to covid-19 problems and this problem. The presence of so many avionics software related problems on so many different products may indicate a high level organizational failure to recognize and properly manage avionics system software. Software management is IMHO often not unlike dodge citXXXXXXXXX an art form.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.