The Airbus 350 needs a hard reboot every 149 hours

doctorow · July 26, 2019, 4:18pm

Originally published at: https://boingboing.net/2019/07/26/common-remote-data-concentra.html

…

Gutierrez · July 26, 2019, 4:21pm

apalatn · July 26, 2019, 4:23pm

Someone didn’t check for memory leaks…

philbin1 · July 26, 2019, 4:33pm

At least there is a way to keep it in the air, unlike the 737 Max, which assumes the pilot wants to dive at the ground and then does so.

RickMycroft · July 26, 2019, 4:38pm

Some flight duration function that was scope-creeped into a power-on runtime one? (Long after the person that baked in the “150 hours is far longer than any possible flight” assumption had moved on.)

Mister44 · July 26, 2019, 4:50pm

“Welcome to Airbus Technical Support, how may I help you. Uh-huh. Uh-huh. You’ve lost power and are nose diving from 40,000 feet?”

ikeOnABike · July 26, 2019, 4:52pm

RickMycroft · July 26, 2019, 5:02pm

It’s scary that it randomly fails. They must have had a number of brown-trouser incidents before someone noticed the key 150 hours runtime.

Oh well, ship the debug version with bounds-checking and exception handling turned on. Click okay through all the warning alert boxes and it’s all good.

aidtopia · July 26, 2019, 5:29pm

The linked Guardian article says the command protocol is ARINC 429. Wikipedia has some info on that protocol, including that the data field is 19 bits.

So if you have a clock that starts counting seconds from the time it’s turned on, after 149 hours, it’ll be 536,400. In binary, that’s 1000 0010 1111 0101 0000. In other words, you recently started needing a 20th bit.

Some systems might truncate the top bit, leaving you with a sudden backwards leap through time. That can easily confuse all sorts of otherwise simple calculations. Other systems might end up with that top bit being interpreted as part of an adjacent field, which could have surprising consequences as well. Referring to the Wikipedia article, it might clobber part of the value of the sign/status matrix, which is how systems know whether the data is correct, unavailable, or simulated (for testing). I could see some software that’s not prepared for that as well.

jandrese · July 26, 2019, 5:34pm

I was just doing the math on that and coming to the same conclusion. A signed 20 bit seconds counter would just about fit the symptoms. 20 bit counters are rare in full blown computers, but microprocessors often cut down on silicon to save cost. Probably someone in the chain thought the counter would be reset after every flight by something running above their system and didn’t successfully communicate that requirement to the system integrators.

Papasan · July 26, 2019, 5:40pm

hard reboot every 149 hours

I’m a every 40 hours of work hard re-boot kind’a guy…

philbin1 · July 26, 2019, 5:41pm

So, you’re a pilot?

Sqyntz · July 26, 2019, 5:51pm

This sort of begs the question, can it be rebooted in midair?

Markus_Baur · July 26, 2019, 5:51pm

hmm …

the maximum flight time of a a350 is on the order of 16 hours (or a bit more for those ultra-ultra-long flights)

at the end of such a flight the engines and most systems are shut down … taking the shut down a little bit deeper is not a real problem and adds only a few minutes to the shutdown / start up time

so the airline simply adds a line to its a350 manual to do a complete shutdown of the aircraft at least every 24h or at the end of such a long distance flight - even if it is forgotten the length of the critical period of 149 hours is so much longer than 24 hours that there is still 5-fold redunancy to ensure a reset (compare that to the 0-fold redundancy of the 737max MCAS system)

its not pretty, but should work sufficiently well … and the aircraft software will be updated latest during the c-check of the aricraft (probably much earlier)

Jorpho · July 26, 2019, 5:51pm

It is common for airliners to be left powered on while parked at airport gates so maintainers can carry out routine systems checks between flights, especially if the aircraft is plugged into ground power.

I would have thought “powering it off and turning it back on again” would be a perfectly reasonable step in “routine systems checks”. I suppose power cycling might put a strain on some of the components, but if they’re going to fail, that would be a convenient time for them to do so.

Snork · July 26, 2019, 5:56pm

Three cheers for the Agile design philosophy! Iterate that sucker!

anon73443820 · July 26, 2019, 5:58pm

When I was a small child I was afraid of drowning.

As a young adult my irrational thoughts about death focused on traffic accidents.

Now I tend to wonder what the odds are that a software bug in an airplane or a medical device or something will take me out before the health consequences of sitting all day mashing a keyboard get me. Technology is a hell of a thing.

ax11 · July 26, 2019, 6:15pm

Consider yourself lucky living at a time when you can sit around and mash some keyboard without something with sabre-teeth sneaking up to you from behind.

RickMycroft · July 26, 2019, 6:21pm

No, a software developer. The fact that it fails literally like clockwork after 150 hours, and they have such a stupid workaround, seems like this isn’t a simple coding bug, but mismatched assumptions across different parts of the project; where 150 hours isn’t handled or trapped because it’s an impossible value to some of the code.

Alternately, it is a simple coding error, but management is too cheap to budget the fix and long QA/certification cycle.

philbin1 · July 26, 2019, 6:26pm

Yeah, I was going to add, which you did, that they must have run the cost/benefit analysis and found that just opening a window works and costs a lot less than fixing the code once and for all.

Topic		Replies	Views
Airline pilots have been complaining for months about Boeing's deathliner boing	85	3372	March 18, 2019
Trying to land on some runways causes the Boeing 737's control screens to go black boing	42	2371	January 13, 2020
Defective Boeing flight catches fire, hospitalizing 10 victims on board (video) boing	52	1570	May 14, 2024
How to save your ass if the Boeing 737 MAX you're flying decides to nosedive boing	48	3271	April 3, 2019
Boeing 737 MAX fleet grounded after mid-air blowout boing	54	1615	January 11, 2024

The Airbus 350 needs a hard reboot every 149 hours

hard reboot every 149 hours

Related topics