Apollo Guidance Computer History ProjectSecond conferenceSeptember 14, 2001
Alarm on the Lunar LandingFRED MARTIN: I wanted to touch upon what I would categorize as almost a religious war between the synchronous executive folks and the interrupt priority-driven folks. The issue really came to a head in the shuttle program where there were forces that believed in asynchronous, priority-driven, interrupt-driven executives. And other forces that believed in absolutely synchronous executives where you planned out all the software and you executed the software by tables. You could thereby tell what part of the software was operating at every instant of time because you had planned it out so that it all operated very rigidly. The people who really used those kinds of executives were mostly people in the aircraft industry. They had come from various airplanes programs where they had these computers and used synchronous executives. It was very important for them to know exactly what was happening at every instant of time. Either from a testing standpoint or whatever. That was their mindset. The people that had worked on Apollo, for the most part, believed in this asynchronous executive where you have priorities and you didn't have everything structured completely. But you allowed higher priority jobs to interrupt lower priority jobs and so on. There were these incredible discussions in the early shuttle time in deciding what kind of an executive and operating system should be used in the shuttle. The shuttle had its own hardware issues with multi-reliability strings and an IO processor and other things that had to be synced. So there were a lot of reasons why this was a big issue.
FRED MARTIN: But, the roots of the Apollo viewpoint really stemmed
back to the alarms that occurred on the lunar landing. And, as you recall, in that flight
when the LM was coming down and all of sudden you had these alarms, and the astronaut kept
getting these alarms. He said they had alarms. Finally there was a decision in Houston to
just push on and that he should land. Whoever made that decision, perhaps Steve
Bales understood what the problem was or he felt that he Eventually, at the suggestion of somebody who worked at the Cape a lot, we tracked down
the fact that they had a switch in the wrong position and it was stealing cycles; it was
the radar rendezvous switch. And that made the AGC run slower. Because it was running
slower, it was dropping low priority jobs and not getting to these low priority jobs and
only doing the highest priority jobs, so the job queue filled up. When the job queue
filled up, it caused this alarm to go off and so on. At any rate, MIT got a lot of
criticism for a software error causing this to happen. So now you had a sort of dichotomy
where people believed that what had happened was actually an error. Other people believe,
no. The software actually saved the program because, in the face of this mistake in the
switch, the software which was written as a priority executive was able to go on with the
highest priority jobs and not tank the mission because it didn't have to do this box car
structured synchronous system where it would give time to everything whether it was
important or not. What Margaret is pointing out is that, when we finally got and found-- I remember the instant that we ran upstairs to look at the telemetry to see where the bits were set. And sure enough, the radar bit, which was picked up by telemetry and downloaded, that was in, let's say the 15th word bit, or how many bits were in the telemetry. And bit 9 or whatever it was showed that the rendezvous radar switch was on. Eventually, the ground told the astronaut, when he was about to take off from the moon, to put the switch in the right position. He said it in a very low key fashion. But when the issue was run down to the end, it was found in the crew procedures to put the switch in the RR position. So the next questions was, well how come if you put the switch in this position, wasn't this picked up in the simulators at Grumman, which these guys had been training on for a couple of years. And in that crew simulator, they had always done exactly what was in the document, put the switch in that position. However, that switch wasn't connected to anything at Grumman. In other words, that
switch was not connected in the simulator to the AGC which would slow them down in that
simulator. So they trained exactly to what those crew procedures said, each time. And they
did it on the landing too. This brings up all these questions when you have events like this take place. And we've
still not solved such issues today, in the industry at large. The Shuttle people, for the most part, didn't want to use those features that we had put in the language. We were very influenced in the language by experiencing it in Apollo. And very influenced by the executive that Hal Laning had designed, and the manner in which he had designed that whole thing. We were very influenced in the language design to take advantage of those things. They were used to a minor extent in the manner in which the shuttle was put together. Hugh Blair-Smith adds: Freds account is accurate except that the percentage of wasted time was exactly 15%, just precisely using up, and microscopically overusing, the planned slack time. Another factor that I remember was Grummans insistence that their 400-cycle power supply, operating the radar, be independently phased relative to the GN&C systems 400-cycle power supply. Had the two supplies been phase-locked, as we had urgently proposed, the switch error would not have created bogus angle differences to run the shaft and trunnion axis angle counters at full speed. Who would ever have thought, in a simulator, to randomize the phase between two AC power supplies? site last updated 12-08-2002 by Alexander Brown |
|