In the aftermath of the Space Shuttle Challenger disaster of 1986, a Presidential Commission was established to determine what went wrong. The most unusual member of the panel was almost certainly the physicist Richard Feynman, some of who’s books I have reviewed. Ultimately, his contribution proved to be controversial and was shifted into an annex of the official report. To me, it seems like a remarkably clear-sighted piece of analysis, with wide-ranging importance for complex organizations in which important things might go wrong.
The full text is available online: Appendix F – Personal observations on the reliability of the Shuttle
He makes some important points about dealing with models and statistics, as well as about the bureaucratic pressures that exist in large organizations. For instance, he repeatedly points out how the fact that something didn’t fail last time isn’t necessarily good evidence that it won’t fail again. Specifically, he points this out with reference to the eroded O-ring that was determined to be the cause of the fatal accident:
But erosion and blow-by are not what the design expected. They are warnings that something is wrong. The equipment is not operating as expected, and therefore there is a danger that it can operate with even wider deviations in this unexpected and not thoroughly understood way. The fact that this danger did not lead to a catastrophe before is no guarantee that it will not the next time, unless it is completely understood. When playing Russian roulette the fact that the first shot got off safely is little comfort for the next. The origin and consequences of the erosion and blow-by were not understood. They did not occur equally on all flights and all joints; sometimes more, and sometimes less. Why not sometime, when whatever conditions determined it were right, still more leading to catastrophe?
In his overall analysis, Feynman certainly doesn’t pull his punches, saying:
Since 1 part in 100,000 would imply that one could put a Shuttle up each day for 300 years expecting to lose only one, we could properly ask “What is the cause of management’s fantastic faith in the machinery?”
and:
It would appear that, for whatever purpose, be it for internal or external consumption, the management of NASA exaggerates the reliability of its product, to the point of fantasy.
It certainly seems plausible that similar exaggerations have been made by the managers in charge of other complex systems, on the basis of similar dubious analysis.
Feynman also singles out one thing NASA was doing especially well – computer hardware and software design and testing – to highlight the differences between a cautious approach where objectives are set within capabilities and a reckless one where capabilities are stretched to try to reach over-ambitious cost or time goals.
Of course, the fact that the Space Shuttle was more dangerous than advertised doesn’t mean it wasn’t worth the risk to launch them. Surely, astronauts were especially well equipped to understand and accept the risks they were facing. Still, if NASA had had a few people like Feyman in positions of influence in the organization, the Shuttle and the program surrounding it would probably have included fewer major risks.
The Space Shuttle definitely seems to have been over-promised to Congress, both in terms of safety and economics.
Columbia Accident Investigation Board
From Wikipedia, the free encyclopedia
Echoes of Challenger
One board member, Dr. Sally Ride, served on both the CAIB panel and Rogers Commission and noted remarkable similarities between the two tragedies. She questioned why the shuttle was allowed to continue flying with known problems that were, eventually, catastrophic.
Since no machine is perfect, the problem comes down to identifying which known problems are an acceptable risk and which are not. In these two examples, shedding foam and failing o-rings, the organization failed to react correctly to the seriousness of the problem: in both cases, whereas engineers recognized the seriousness of the problem, NASA management dismissed both the evidence and the engineers’ expertise and ultimately decided to continue with the mission, with catastrophic results.
To illustrate the organizational problems of safety awareness, Richard Feynman attached a personal appendix to the Rogers Commission Report. It is equally relevant to the CAIB report. In it, he wrote: “It appears that there are enormous differences of opinion as to the probability of a failure with loss of vehicle and of human life. The estimates range from roughly 1 in 100 to 1 in 100,000. The higher figures come from the working engineers, and the very low figures from management. What are the causes and consequences of this lack of agreement? … we could properly ask, ‘What is the cause of management’s fantastic faith in the machinery?'”
The CAIB report found these same misperceptions by management and concluded that they contributed to the accident. Both reports also examined the ability of schedule pressures to influence safety-related design decisions.
The ultimate responsibility for the failure of Challenger as well as Columbia must reside with the decisionmakers – in this case, NASA executives who decided to ignore, dismiss, or minimize the testimony of their engineering experts.
“With Challenger, an O-ring that should not have eroded at all did erode on earlier shuttle launches. Yet managers felt that because it had not previously eroded by more than 30%, this was not a hazard as there was “a factor of three safety margin”. Morton-Thiokol designed and manufactured the SRBs, and during a pre-launch conference call with NASA, Roger Boisjoly, the Thiokol engineer most experienced with the O-rings, pleaded with management repeatedly to cancel or reschedule the launch. He raised concerns that the unusually cold temperatures would stiffen the O-rings, preventing a complete seal, which was exactly what happened on the fatal flight. However, Thiokol’s senior managers overruled him, dismissing his safety concerns, and allowed the launch to proceed. Challenger’s O-rings eroded completely through as predicted, resulting in the complete destruction of the spacecraft and the loss of all seven astronauts on board.
Columbia was destroyed because of damaged thermal protection from foam debris that broke off the external tank during ascent. The foam had not been designed or expected to break off, but had been observed in the past to do so without incident. The original shuttle operational specification said the orbiter thermal protection tiles were designed to withstand virtually no debris hits at all. Over time NASA managers gradually accepted more tile damage, similar to how O-ring damage was accepted. The Columbia Accident Investigation Board called this tendency the “normalization of deviance” — a gradual acceptance of events outside the design tolerances of the craft simply because they had not been catastrophic to date.”
25 years after Challenger—inspiring the future of space science
Maggie Koerth-Baker at 8:00 AM Friday, Jan 28, 2011
You know the story of the Challenger space shuttle disaster—the engineers’ warnings that were ignored, the lives lost, and the lives forever altered. But, on the 25th anniversary of the tragedy, the families those astronauts left behind are trying to make it clear that Challenger was more than just a single, traumatic day. Instead, for people like June Scobee Rodgers—widow of Challenger Commander Dick Scobee—the years since the Challenger space shuttle broke apart have proven that good things can come out of terrible events.
Roger Boisjoly dies at 73; engineer tried to halt Challenger launch
The night before the 1986 explosion, Boisjoly and four others argued that joints in the shuttle’s boosters couldn’t withstand a cold-weather launch.
By Ralph Vartabedian, Los Angeles Times
February 7, 2012
The 1986 explosion that destroyed the space shuttle Challenger and killed seven astronauts shocked the nation, but for one rocket engineer the tragedy became a personal burden and created a lifelong quest to challenge the bureaucratic ethics that had caused the tragedy.
Roger Boisjoly was an engineer at solid rocket booster manufacturer Morton Thiokol and had begun warning as early as 1985 that the joints in the boosters could fail in cold weather, leading to a catastrophic failure of the casing. Then on the eve of the Jan. 28, 1986, launch, Boisjoly and four other space shuttle engineers argued late into the night against the launch.
In cold temperatures, o-rings in the joints might not seal, they said, and could allow flames to reach the rocket’s metal casing. Their pleas and technical theories were rejected by senior managers at the company and NASA, who told them they had failed to prove their case and that the shuttle would be launched in freezing temperatures the next morning. It was among the great engineering miscalculations in history.