AGC - Conference 2: Innovations in software

Apollo Guidance Computer Activities

Apollo Guidance Computer History Project

Second conference

September 14, 2001

Innovations in Software

MARGARET HAMILTON: Also an advantage, from a software, computer science or software engineering point of view of having different missions is that we learned all about the advantages of things that people don't think too much about today from the same perspective, like re-use. And how that could cut down on testing. We learned all the things about what it means to manage these different missions. What's re-used between the LM and the command module. Between this mission and the one that's going next. How do you evolve? How do you do it in a reliable way? In other words, it was a great place to learn about how to do things the right way.

FRED MARTIN: The whole thing had to be structured too. All that software had to be-- We structured it into programs with numbers. We had names for them. The astronauts eventually picked up some of those same names. They would get into P40. Not that those names were so great. But the fact of the matter is that that software all had to be structured and organized and there had to be a nomenclature on it. There had to be interfaces built.

MARGARET HAMILTON: They took on a life of their own.

RAMON ALONSO: And it had to be ready, what was it, eight to ten weeks before flight time. You couldn't change it after that.

MARGARET HAMILTON: And also we learned about how important it was to prioritize as to what was the most important to put into the software. Because there was so little room to put the programs into that you would learn to delete what was less important. But prioritizing became something that was always a consideration. And back-up systems also became something that were always a consideration-- You lived your life later, software life, by doing things with more prioritizing. And doing things always with back-up systems. And in your daily life. Even going to the grocery store. Later on, it was such an influence.

Also, we had Black Friday. I can't remember what Black Friday was. I just remember we were deleting all kinds of things because something more important was coming in. What was it? Maybe it was backups or-- I don't remember now.

FRED MARTIN: Ray, were you the person who inflicted banks on us?

RAMON ALONSO: That was a solution to a very bad problem.

FRED MARTIN: It was a very bad solution to a very bad problem.

RAMON ALONSO: That solution exists today in Windows. Right? In the 86 architecture. Because you still have to do banking in order to get extended address fields and things like—

HERB THALER: It's just invisible.

FRED MARTIN: It was almost like area codes I would say in the telephone system, I mean. How many bits you had for addressing. And in order to get unique addressing, you had to mark it with another field that would let you address uniquely. So you had to manage that other field. That got very confusing in the software. It was a difficult issue.

DAN LICKLY: And the erasable memory was so small. Was it 2k? A certain of it was permanent. But a lot of it like you use during burn. And when you were done burns, that same area was used for something in a later phase. So these phases were sequential. And all of the previous information was gone and not used anymore. So you didn't have unique—

Hugh Blair-Smith adds:

As to the re-use of erasable memory in this section, push-down stacks were unheard-of when we designed the instruction set, and we might wonder whether that highly structured and controlled way of re-using "erasable" would have helped. My guess is that we would have suffered through a lot of the history we’ve seen over the past two decades, of using corrupted stack pointers and stepping on other people’s stack frames, and all the powerful mistakes you can make with powerful tools. Given that these problems had not been met and overcome then, I’ll bet we were better off the way we were.

MARGARET HAMILTON: You can see there were a lot of things to do with interfacing correctly. And therefore, we had the opportunity to make a lot of interface errors. Because we had to squeeze everything into such small space. But also, it's amazing what one could do with so little space. And how little one can do with so much space today.

ALEX KOSMALA: One of the things that was remarkable -- I have remarked to myself, over the 30 years since then -- was that we wrote software into a machine that, by today's standards, would be a very advanced architecture. Later I worked with people who said, "What? You programmed into a real-time control computer with an event-driven, asynchronous executive?" Such machines are not commonplace today.

DAN LICKLY: Yeah. With autopilots and jets sending signals left and right.

ALEX KOSMALA: We invented it for this job. And it hasn't re-emerged as a standard of computer design today. Surely, there are interrupts and stuff like-- But they were not used in this coordinated fashion that we had then.

Something else that we invented, and it saved those guys' butts on the moon, was the concept of software restarts, which allowed, in reaction to any kind of failure, for the computer to retrieve back to its last known properly coordinated state and then to attempt a restart of the flight program from that point.

DAN LICKLY: You don't have to do CTRL-ALT-Delete.

ALEX KOSMALA: This thing set-- What did we call those things? Restart points as it went along.

MARGARET HAMILTON: Dan was the father of that.

ALEX KOSMALA: I was going to say. This guy [Dan Lickly] and Woody Vandever. Dan's the father. Woody was the guy who put the scheme into practice -- I advised that Woody should be here. But not only did the software restart concept, I think, rescue the LM from failure, it was also a terrific help in debugging this complex piece of software. We triggered restarts because, if this thing was able to go over a same piece, we could change things and re-run chunks using the characteristics of the software itself.

I think Margaret probably remembers more of that than I do. If we had not had the inherent software restart capability, the testing of this real-time software would have been a hell of a lot more difficult.

MARGARET HAMILTON: You brought up something that was really interesting because people today, when they use computers, they try something, and when it doesn't work, they try something out again, and when it doesn't work, something else etc. And they just have the computer always at their disposal. We could only put in runs for processing overnight. So you would have to think very carefully because if it didn't work, then you'd have to wait until the next day. (See Hugh Blair-Smith's annotations)

ALEX KOSMALA: You wasted a day.

MARGARET HAMILTON: A whole day. So then what we learned to do was to find a way to run like a hundred runs at once, thinking about it before the fact. And the restarts remind me of that. So you'd try all these different things all at once. And in a way, you got more done because you had to think it out so carefully. You got done maybe more in that 24 hour period than if you were sitting here ad hoc-ly just trying something. So you were basically doing things more in parallel, I think, in your thinking process and your design of the test. So that was totally different.

ALEX KOSMALA: I don't know what lessons can be learned from that. But we did this with cards, offline. There were no such things as terminals or interactive facilities. As Margaret said, you got your stack of 2000 cards, you gave it to the guys in the computer room, who, by then, had corded themselves off. We had been used to going in there and running the runs ourselves. At some stage when Jim's double IBM 360 came in, that was the end of that.

MARGARET HAMILTON: The thing is, if you dropped the cards, you had a real problem. But it made you realize-- That was part of the reason we worked on doing something in our research in later years where, if you "dropped" the instruction sets, it wouldn't matter because it was smart enough to get it back together again, since in our new systems language it is now single language, single reference. That was later. But we used to take part of the cards and say, "Here, put this in your deck." And that would be the reusable. Remember that?

ALEX KOSMALA: These are like horror stories or war stories. There's probably something to be learned from this experience. That the few, under arduous circumstances, were able to turn out something as complex as this piece of software was and prove that it would work. That seems to be missing today. I don't know how to put that together. But, the fewer people and the more crude the development environment was, the better seemed to have been the progress on the end product. There's a conundrum there. I'm not sure what it is.

FRED MARTIN: We did something that I know Margaret was intimately involved with. I think it's written up in Microsoft's method of doing their software today. We had a nightly configuration control group that looked at all changes that were created by anybody during the day and had to pass through this little group of people, which might have been four or five people. We had this "advantage" of doing an assembly every night so that these changes were brought in to one listing, you might say. Then by the next day, you'd have an update of that program. So you had an extremely tight configuration control system.

Today, if everybody's programming on monitors and they're all distributed and so on, this is a much more difficult environment than we had in this one building. Everybody producing changes and cards and write-ups of what their changes were. All being funneled to one small configuration control group who would then pass on everything that went into that system. That was done continuously. I don't know if it started right at day one. But it was done on the flights that I remember. And it was a very effective way of keeping errors out of the program.

MARGARET HAMILTON: But there was more to add to that in that every single day, the person who reviewed the changes would put out a memo that would go to everyone that was involved in that mission so that they were aware of what the changes were. So then if they had problems, they'd know that a change had been made in that area. So it was that communication, just hard copy memos that went out alongside each daily revision.

JIM MILLER: It shouldn't be forgotten that this was an assembly language program. If somebody had set up a simulation with what we called 'special requests' addressed to certain locations, and somebody changed the underlying code, the locations could move. And that simulation run would just be incorrectly formulated. But that would take a day to find out. So you had to be careful not to kill somebody's run by shooting down something that expected a fixed configuration. Which never was 'fixed,' because somebody had to change something everyday.

Towards the end of the development of a mission program - and this is a fascinating thing - it's been mentioned that NASA let us do just about anything we wanted to in the code, though we did have some help from a famous guy at TRW. Basically, we were given complete free rein until we had delivered the software to Raytheon to build the core rope. And then we would go into a series of tests with an absolutely fixed configuration, the purpose being to find any remaining bugs. Sometimes you would find bugs that you felt just had to be fixed.

But once the program was delivered, this group of people, who were the only people in the world who NASA thought could write the software, suddenly were so stupid that no change that they proposed could be plausible at all. So anything you suggested, the answer was no. You had to beg, in the interest of the project, to be allowed to repair this desperately bad thing.

I remember one night, in the LM-1 software development, one of these showed up, probably in the digital auto-pilot, which was new. It must have been nine or ten at night. And I'm not going to mention any names, for reasons that you'll see. I called the NASA guy who was monitoring us at home, because I had to get his permission to put this change in and to let him know it was coming. His wife answered the phone and said he was at work. So I called him at his office. Nobody there. Well I knew this guy wasn't always telling his wife what he was up to. Here I was with this problem. It was late at night. We wanted to put this change in. It was, in my mind, whatever it was, really important to put this change in. I knew the guy wasn't where his wife thought he was. And I didn't know what to do. I couldn't leave him a message at work because they didn't have voicemail. And I didn't want to call his wife back and say he wasn't at work for fear of what would happen to him when he got home. What do you do in a case like that? We were absolutely at the mercy of getting approval on those things. (I don't remember what finally happened.)

There were some funny situations like that. Many times the software was changed after the rope had actually been built and they would sometimes have already potted it. And they'd have to de-pot it and change the wires and so forth. It was an incredibly arduous process to go through.

One of the things that I think deserves mention to illustrate the frugality that went into the design and the software was that the guys that invented the interpreters, and we had a second generation of them after the one that Hal Laning had developed, realized that single-precision was too small to do Earth-to-moon navigation, but double-precision was adequate. So the interpreter could do arithmetic in double-precision. It was a ones-complement machine, which was funny enough by itself: there were 32,767 states, not 32,768 states. Unfortunately, there were some optical encoders on rotating shafts that needed 2¹⁵ exactly and not 2¹⁵-1. So the computer was also capable of doing two's-complement arithmetic, which caused some other problems. But here we had a ones-complement machine with two forms of zero. And double-precision in which the interpreter was perfectly happy to let the two parts of the double word to have different signs. You could have a +1/4 in the upper word, and a -1/2 in the lower. That was clever, because the software to do sign-correction all the time took time and space.

Unfortunately, however, there were a lot of cases where that just turned out to be a real pain in the backside. There were lots and lots of bugs that showed up when there was sign disagreement in a way that somehow got through all the checks. So there were things that we did to ourselves sometimes in the interest of frugality. It was a different day.

One of the fascinating problems was the reuse of erasable memory. Like every memory problem, a piece of software that would use somebody else's memory would likely get away with it because if it started writing into an area and then completed what it had to do, it would go away. And then the routine which had left something there earlier, for its use when it ran again later, would hit a value that somebody else's code had written in there, and go completely wrong. And by then, the real violator was long gone and you had to figure out who did it.

There were also problems of using erasable memory that hadn't been initialized the way you thought it had. And we came up with a technique to put a kind of random number into all erasable when we started.

MARGARET HAMILTON: In background.

JIM MILLER: And if that value was ever used and caused trouble, you could see it. We had a way to put that same random number back in so you could do diagnosis to find the problem. These were just things that nobody does anymore.

DAN LICKLY: They still haven't been solved. You don't know how many students of mine, like now, their programs only work when there's zero in the location when they start. That's the worst thing, I've decided, that you could put in a unused—

MARGARET HAMILTON: We should be asking why aren't people in general doing these things today? There are so many things that would really help debug systems today that never were brought forward in some way by most. Many things are being used today but that was one that was, I think, more advanced than most debugging tools we have commonly available, at least in the more traditional tools. Why do you think that is? Incidentally, this is an example of the kinds of things that have inherently been incorporated into our systems language and its associated automation as a result of our experiences on the Apollo effort.

Hugh Blair-Smith adds:

Margaret’s observation about assemblies being so long that they were performed as overnight batch jobs, brings me to a confession of one thing I wish I had done better, and I really think I could have at the time. When it became clear, early in the project, that the number of words in AGC memory was going to be greater than the number of words in the assembling machine’s (i.e., the Honeywell 800/1800’s) memory, I gave up any attempt to retain the object program in memory and just wrote each patch of object code on tape. Then there was a "third pass" of the assembly process which sorted the object code by what is perhaps the dumbest sort algorithm possible: running that tape back and forth and writing the object code in its proper order on another tape. As long as programs were just a few thousand instructions, this went like the wind, but the full-size programs of 36000 or so instructions made Pass 3 pretty boring and frustrating to watch. We did have a hard disk drive from 1965 or 1966, so I could and should have written my object code onto that and read off the sorted code in a flash. There wasn’t anything like DOS to keep disk files out of each other’s hair, so this effort would have involved learning and overcoming some risks of data getting stepped on. And there were always other little enhancements that had to be done, so it wouldn’t have been easy--but I still wish I’d found a way to get it done. I guess one thing that decreased the motivation was that each assembly had to be printed, in about 3 or 4 hours on the noisy 600 line-per-minute printer, so there wasn’t anything the disk could do about assembly being an overnight batch process. I had no notion, at the time, of assembling modules separately and then linking the object modules, and the long and difficult history of PC linkers suggests that it wouldn’t have been a good idea to try it even if I had thought of it.

Interaction with astronauts

site last updated 12-08-2002 by Alexander Brown