In the past six months, two of Boeing’s flagship new 737 MAX aircraft have crashed in violent and eerily similar manners. As the investigations into the circumstances of the crashes continue, it looks increasingly likely that a combination of poor programming, and limited oversight have combined to result in the tragic loss of 346 lives. Safety standards in the aviation industry are supposed to be among the highest of any industry. However, it seems that business pressure in the rush to market have had deadly consequences. Let’s take a closer look at what these crashes have in common, what role the MCAS may have played, and how a combination of limited oversight and faulty logic in software could have killed all of those people
The Boeing 737 has a long history spanning more than 50 years. It’s been a mainstay of the short to medium haul route, with over 10,000 of them having been built for airlines in all corners of the world. Why then, is it suddenly an aircraft so demonised, and now grounded globally? Well, it’s important to understand that the 737s flying today have little in common with the first 737s to roll off the production line in 1968. Indeed, about the only things they do have in common are manufacturer, number of wings, and number of engines. In the more than 50 years since then, Boeing has been continually updating and modernising the design, adding more efficient engines, more efficient wing design, new computerised all glass cockpits, and constantly revising the way the aircraft functions. This continuous improvement and yet consistent design has perhaps been one of the main reasons for the type’s popularity.
So for an aircraft with decades of development and improvement, why has it suddenly fallen from favour? The answer, at least it seems intially, is the Manoeuvering Characteristics Augmentation System, or MCAS for short. It’s a safety system first included in the 737 MAX, which is the latest incarnation of the 737. In simple terms, it’s designed to prevent the aircraft stalling by automatically applying nose down pressure when the aircraft approaches a stall. Ordinarily this is a good thing. Stalling an aircraft is not like stalling a car. When you stall a car, the engine stops, but when you stall an aircraft, what you’re actually doing is taking the wing to the point when it no longer provides lift. Without lift, an aircraft is just a large metal object falling out of the sky. Surely though, a system which prevents stalling is a good thing? How can a system designed to prevent a stall cause a crash? Well, the answer here lies in how that system was implemented. An important concept to understand in aviation is that of Angle of Attack (AoA). Angle of attack is the relative angle between the wing and the oncoming airflow. As this increases, it becomes harder and harder for a clean separation of airflow over and under the wing to take place, until it reaches a point where the separation breaks down altogether. When this takes place, the pressure differential between the upper and lower parts of the wing that normally produces lift disappears. Now the aircraft is falling.
MCAS is designed to prevent the aircraft ever reaching that point. By constantly monitoring the angle of attack, the flight computer would be able to anticipate a stall before it happened, and gently push the nose down in order to prevent it from happening. The aircraft would continue in flight, and the flight crew could respond, recover, and resume normal flight. That’s at least the idea. However, with the crash of Lion Air Flight 610 and Ethiopian Airlines flight 302 it appears that the MCAS has an unexpected dark side to it. How this came to be allowed on commercially operated aircraft appears to be the result of a number of errors, no one of which would ultimately result in the crashes, but which when combined could only lead to tragedies like these. Sadly, this is often the way in aviation accidents, with the “Swiss Cheese” model applying. Let us then examine the factors in play here.
The process of certifying a new airliner for safe public use is a long and arduous one. The certifying regulatory body, typically the national body for regulating aviation in the country of the an aircraft’s manufacture, must submit the aircraft to a vast multitude of tests. These tests are designed to push the aircraft beyond anything it would likely experience in normal commercial operations, in order to give confidence that the aircraft is safe to operate. In the case of the 737 MAX, it appears that a number of the tests that would normally have been carried by the regulatory authority (in the case the United States Federal Aviation Authority) were actually delegated downward to Boeing themselves. This occurred despite there being a clear conflict of interest in Boeing certifying themselves as safe (in certain aspects of the aircraft’s operation) and therefore ready to move to market. Moreover, this meant that when the FAA were told that MCAS had a given amount of authority, they didn’t insist on testing it themselves. What actually turned out to be true is that the MCAS system had considerably more authority, and worse than that, its authority increased with every activation, even if pilots tried to disable it. So the aircraft was certified as safe, when it was taking off with a system that could quickly obtain ultimate authority over pitch, pushing the nose ever further downward. Now the Swiss Cheese model really comes into effect:
The MCAS acts based on information from a single sensor. It does not include anything to check if that sensor is functioning correctly. It does not verify what that sensor is telling it by referencing other sensor data. What this means is that if the sensor fails, the faulty data it sends is taken as Gospel by MCAS and acted on accordingly. So while the AoA sensor may say that the aircraft is in a deep stall, the fact that the altimeter and airspeed indicator indicate a normal flight profile is not taken into consideration. The flight computer as a whole has all the information it needs to know that the AoA sensor is faulty and so MCAS should not be activated, but the programming logic in the system doesn’t include those checks. What’s even worse… each 737 has more than one AoA sensor, but MCAS is configured to only consult one of them. A simple piece of logic in the code that says “If sensor one and sensor two disagree, do not apply MCAS input and instead warn the pilots” would have prevented hundreds of deaths.
The final nail in the coffin: the poor logic in deciding how much authority to give MCAS. Each time the pilots dismiss MCAS by pushing on a button on their controls, the baseline for how much additional authority to give it is reset. Meaning that with a faulty sensor, each reset by the pilots actually gave the malfunctioning system more authority. Normal behaviour to dismiss an errant system actually made that system more of a threat. The current logic that runs the MCAS is fundamentally flawed. It is not fault tolerant, and it gains more authority the more pilots dismiss it, until such time as the pilots cannot override it. The plane, with the flawed but good intentions of Boeing’s software engineers, flies everyone into the ground in order to save them.