|
Apollo Guidance Computer History Project
Second conference
September 14, 2001
Innovations in Software
MARGARET HAMILTON: Also an advantage, from a software, computer science or
software engineering point of view of having different missions is that we learned all
about the advantages of things that people don't think too much about today from the same
perspective, like re-use. And how that could cut down on testing. We learned all the
things about what it means to manage these different missions. What's re-used between the
LM and the command module. Between this mission and the one that's going next. How do you
evolve? How do you do it in a reliable way? In other words, it was a great place to learn
about how to do things the right way.
FRED MARTIN: The whole thing had to be structured too. All that software had to
be-- We structured it into programs with numbers. We had names for them. The astronauts
eventually picked up some of those same names. They would get into P40. Not that those
names were so great. But the fact of the matter is that that software all had to be
structured and organized and there had to be a nomenclature on it. There had to be
interfaces built.
MARGARET HAMILTON: They took on a life of their own.
RAMON ALONSO: And it had to be ready, what was it, eight to ten weeks before flight
time. You couldn't change it after that.
MARGARET HAMILTON: And also we learned about how important it was to prioritize as
to what was the most important to put into the software. Because there was so little room
to put the programs into that you would learn to delete what was less important. But
prioritizing became something that was always a consideration. And back-up systems also
became something that were always a consideration-- You lived your life later, software
life, by doing things with more prioritizing. And doing things always with back-up
systems. And in your daily life. Even going to the grocery store. Later on, it was such an
influence.
Also, we had Black Friday. I can't remember what Black Friday was. I just remember we
were deleting all kinds of things because something more important was coming in. What was
it? Maybe it was backups or-- I don't remember now.
FRED MARTIN: Ray, were you the person who inflicted banks on us?
RAMON ALONSO: That was a solution to a very bad problem.
FRED MARTIN: It was a very bad solution to a very bad problem.
RAMON ALONSO: That solution exists today in Windows. Right? In the 86 architecture.
Because you still have to do banking in order to get extended address fields and things
like
HERB THALER: It's just invisible.
FRED MARTIN: It was almost like area codes I would say in the telephone system, I
mean. How many bits you had for addressing. And in order to get unique addressing, you had
to mark it with another field that would let you address uniquely. So you had to manage
that other field. That got very confusing in the software. It was a difficult issue.
DAN LICKLY: And the erasable memory was so small. Was it 2k? A certain of it was
permanent. But a lot of it like you use during burn. And when you were done burns, that
same area was used for something in a later phase. So these phases were sequential. And
all of the previous information was gone and not used anymore. So you didn't have
unique
Hugh Blair-Smith adds:
As to the re-use of erasable memory in this section, push-down stacks were unheard-of
when we designed the instruction set, and we might wonder whether that highly structured
and controlled way of re-using "erasable" would have helped. My guess is that we
would have suffered through a lot of the history weve seen over the past two
decades, of using corrupted stack pointers and stepping on other peoples stack
frames, and all the powerful mistakes you can make with powerful tools. Given that these
problems had not been met and overcome then, Ill bet we were better off the way we
were.
MARGARET HAMILTON: You can see there were a lot of things to do with interfacing
correctly. And therefore, we had the opportunity to make a lot of interface errors.
Because we had to squeeze everything into such small space. But also, it's amazing what
one could do with so little space. And how little one can do with so much space today.
ALEX KOSMALA: One of the things that was remarkable -- I have remarked to myself,
over the 30 years since then -- was that we wrote software into a machine that, by today's
standards, would be a very advanced architecture. Later I worked with people who said,
"What? You programmed into a real-time control computer with an event-driven,
asynchronous executive?" Such machines are not commonplace today.
DAN LICKLY: Yeah. With autopilots and jets sending signals left and right.
ALEX KOSMALA: We invented it for this job. And it hasn't re-emerged as a standard
of computer design today. Surely, there are interrupts and stuff like-- But they were not
used in this coordinated fashion that we had then.
Something else that we invented, and it saved those guys' butts on the moon, was the
concept of software restarts, which allowed, in reaction to any kind of failure, for the
computer to retrieve back to its last known properly coordinated state and then to attempt
a restart of the flight program from that point.
DAN LICKLY: You don't have to do CTRL-ALT-Delete.
ALEX KOSMALA: This thing set-- What did we call those things? Restart points as it
went along.
MARGARET HAMILTON: Dan was the father of that.
ALEX KOSMALA: I was going to say. This guy [Dan Lickly] and Woody Vandever. Dan's
the father. Woody was the guy who put the scheme into practice -- I advised that Woody
should be here. But not only did the software restart concept, I think, rescue the LM from
failure, it was also a terrific help in debugging this complex piece of software. We
triggered restarts because, if this thing was able to go over a same piece, we could
change things and re-run chunks using the characteristics of the software itself.
I think Margaret probably remembers more of that than I do. If we had not had the
inherent software restart capability, the testing of this real-time software would have
been a hell of a lot more difficult.
MARGARET HAMILTON: You brought up something that was really interesting because
people today, when they use computers, they try something, and when it doesn't work, they
try something out again, and when it doesn't work, something else etc. And they just have
the computer always at their disposal. We could only put in runs for processing overnight.
So you would have to think very carefully because if it didn't work, then you'd have to
wait until the next day. (See Hugh Blair-Smith's annotations)
ALEX KOSMALA: You wasted a day.
MARGARET HAMILTON: A whole day. So then what we learned to do was to find a way to
run like a hundred runs at once, thinking about it before the fact. And the restarts
remind me of that. So you'd try all these different things all at once. And in a way, you
got more done because you had to think it out so carefully. You got done maybe more in
that 24 hour period than if you were sitting here ad hoc-ly just trying something. So you
were basically doing things more in parallel, I think, in your thinking process and your
design of the test. So that was totally different.
ALEX KOSMALA: I don't know what lessons can be learned from that. But we did
this with cards, offline. There were no such things as terminals or interactive
facilities. As Margaret said, you got your stack of 2000 cards, you gave it to the guys in
the computer room, who, by then, had corded themselves off. We had been used to going in
there and running the runs ourselves. At some stage when Jim's double IBM 360 came in,
that was the end of that.
MARGARET HAMILTON: The thing is, if you dropped the cards, you had a real problem.
But it made you realize-- That was part of the reason we worked on doing something in our
research in later years where, if you "dropped" the instruction sets, it
wouldn't matter because it was smart enough to get it back together again, since in our
new systems language it is now single language, single reference. That was later. But we
used to take part of the cards and say, "Here, put this in your deck." And that
would be the reusable. Remember that?
ALEX KOSMALA: These are like horror stories or war stories. There's probably
something to be learned from this experience. That the few, under arduous circumstances,
were able to turn out something as complex as this piece of software was and prove that it
would work. That seems to be missing today. I don't know how to put that together. But,
the fewer people and the more crude the development environment was, the better seemed to
have been the progress on the end product. There's a conundrum there. I'm not sure what it
is.
FRED MARTIN: We did something that I know Margaret was intimately involved with.
I think it's written up in Microsoft's method of doing their software today. We had a
nightly configuration control group that looked at all changes that were created by
anybody during the day and had to pass through this little group of people, which might
have been four or five people. We had this "advantage" of doing an assembly
every night so that these changes were brought in to one listing, you might say. Then by
the next day, you'd have an update of that program. So you had an extremely tight
configuration control system.
Today, if everybody's programming on monitors and they're all distributed and so on,
this is a much more difficult environment than we had in this one building. Everybody
producing changes and cards and write-ups of what their changes were. All being funneled
to one small configuration control group who would then pass on everything that went into
that system. That was done continuously. I don't know if it started right at day one. But
it was done on the flights that I remember. And it was a very effective way of keeping
errors out of the program.
MARGARET HAMILTON: But there was more to add to that in that every single day, the
person who reviewed the changes would put out a memo that would go to everyone that was
involved in that mission so that they were aware of what the changes were. So then if they
had problems, they'd know that a change had been made in that area. So it was that
communication, just hard copy memos that went out alongside each daily revision.
JIM MILLER: It shouldn't be forgotten that this was an assembly language program.
If somebody had set up a simulation with what we called 'special requests' addressed to
certain locations, and somebody changed the underlying code, the locations could move. And
that simulation run would just be incorrectly formulated. But that would take a day to
find out. So you had to be careful not to kill somebody's run by shooting down something
that expected a fixed configuration. Which never was 'fixed,' because somebody had to
change something everyday.
Towards the end of the development of a mission program - and this is a fascinating
thing - it's been mentioned that NASA let us do just about anything we wanted to in the
code, though we did have some help from a famous guy at TRW. Basically, we were given
complete free rein until we had delivered the software to Raytheon to build the core rope.
And then we would go into a series of tests with an absolutely fixed configuration, the
purpose being to find any remaining bugs. Sometimes you would find bugs that you felt just
had to be fixed.
But once the program was delivered, this group of people, who were the only people in
the world who NASA thought could write the software, suddenly were so stupid that no
change that they proposed could be plausible at all. So anything you suggested, the answer
was no. You had to beg, in the interest of the project, to be allowed to repair this
desperately bad thing.
I remember one night, in the LM-1 software development, one of these showed up,
probably in the digital auto-pilot, which was new. It must have been nine or ten at night.
And I'm not going to mention any names, for reasons that you'll see. I called the NASA guy
who was monitoring us at home, because I had to get his permission to put this change in
and to let him know it was coming. His wife answered the phone and said he was at work. So
I called him at his office. Nobody there. Well I knew this guy wasn't always telling his
wife what he was up to. Here I was with this problem. It was late at night. We wanted to
put this change in. It was, in my mind, whatever it was, really important to put this
change in. I knew the guy wasn't where his wife thought he was. And I didn't know what to
do. I couldn't leave him a message at work because they didn't have voicemail. And I
didn't want to call his wife back and say he wasn't at work for fear of what would happen
to him when he got home. What do you do in a case like that? We were absolutely at the
mercy of getting approval on those things. (I don't remember what finally happened.)
There were some funny situations like that. Many times the software was changed after
the rope had actually been built and they would sometimes have already potted it. And
they'd have to de-pot it and change the wires and so forth. It was an incredibly arduous
process to go through.
One of the things that I think deserves mention to illustrate the frugality that went
into the design and the software was that the guys that invented the interpreters, and we
had a second generation of them after the one that Hal Laning had developed, realized that
single-precision was too small to do Earth-to-moon navigation, but double-precision was
adequate. So the interpreter could do arithmetic in double-precision. It was a
ones-complement machine, which was funny enough by itself: there were 32,767 states, not
32,768 states. Unfortunately, there were some optical encoders on rotating shafts that
needed 215 exactly and not 215-1. So the computer was also capable
of doing two's-complement arithmetic, which caused some other problems. But here we had a
ones-complement machine with two forms of zero. And double-precision in which the
interpreter was perfectly happy to let the two parts of the double word to have different
signs. You could have a +1/4 in the upper word, and a -1/2 in the lower. That was clever,
because the software to do sign-correction all the time took time and space.
Unfortunately, however, there were a lot of cases where that just turned out to be a
real pain in the backside. There were lots and lots of bugs that showed up when there was
sign disagreement in a way that somehow got through all the checks. So there were things
that we did to ourselves sometimes in the interest of frugality. It was a different day.
One of the fascinating problems was the reuse of erasable memory. Like every memory
problem, a piece of software that would use somebody else's memory would likely get away
with it because if it started writing into an area and then completed what it had to do,
it would go away. And then the routine which had left something there earlier, for its use
when it ran again later, would hit a value that somebody else's code had written in there,
and go completely wrong. And by then, the real violator was long gone and you had to
figure out who did it.
There were also problems of using erasable memory that hadn't been initialized the way
you thought it had. And we came up with a technique to put a kind of random number into
all erasable when we started.
MARGARET HAMILTON: In background.
JIM MILLER: And if that value was ever used and caused trouble, you could see it.
We had a way to put that same random number back in so you could do diagnosis to find the
problem. These were just things that nobody does anymore.
DAN LICKLY: They still haven't been solved. You don't know how many students of
mine, like now, their programs only work when there's zero in the location when they
start. That's the worst thing, I've decided, that you could put in a unused
MARGARET HAMILTON: We should be asking why aren't people in general doing these
things today? There are so many things that would really help debug systems today that
never were brought forward in some way by most. Many things are being used today but that
was one that was, I think, more advanced than most debugging tools we have commonly
available, at least in the more traditional tools. Why do you think that is? Incidentally,
this is an example of the kinds of things that have inherently been incorporated into our
systems language and its associated automation as a result of our experiences on the
Apollo effort.
Hugh Blair-Smith adds:
Margarets observation about assemblies being so long that they were performed as
overnight batch jobs, brings me to a confession of one thing I wish I had done better, and
I really think I could have at the time. When it became clear, early in the project, that
the number of words in AGC memory was going to be greater than the number of words in the
assembling machines (i.e., the Honeywell 800/1800s) memory, I gave up any
attempt to retain the object program in memory and just wrote each patch of object code on
tape. Then there was a "third pass" of the assembly process which sorted the
object code by what is perhaps the dumbest sort algorithm possible: running that tape back
and forth and writing the object code in its proper order on another tape. As long as
programs were just a few thousand instructions, this went like the wind, but the full-size
programs of 36000 or so instructions made Pass 3 pretty boring and frustrating to watch.
We did have a hard disk drive from 1965 or 1966, so I could and should have written my
object code onto that and read off the sorted code in a flash. There wasnt anything
like DOS to keep disk files out of each others hair, so this effort would have
involved learning and overcoming some risks of data getting stepped on. And there were
always other little enhancements that had to be done, so it wouldnt have been
easy--but I still wish Id found a way to get it done. I guess one thing that
decreased the motivation was that each assembly had to be printed, in about 3 or 4 hours
on the noisy 600 line-per-minute printer, so there wasnt anything the disk could do
about assembly being an overnight batch process. I had no notion, at the time, of
assembling modules separately and then linking the object modules, and the long and
difficult history of PC linkers suggests that it wouldnt have been a good idea to
try it even if I had thought of it.
Interaction with astronauts
site last updated 12-08-2002 by Alexander Brown |
|