May 2025 -- work in progress (~25%)
Last month, Bill Gates celebrated the 50th anniversary of Microsoft by releasing its earliest known source code: the Altair BASIC interpreter 3.0. The significance of Microsoft's first endeavor has been discussed ad nauseum by industry and insiders for decades - to which, I have nothing to add.
As for the source code, I hope to help future code readers enjoy exploring minds from an earlier era. The first hurdle? Converting the 314 pages of fanfold impact printouts into something easier to analyze. Here you go:
Altair BASIC 3.0 source on GitHub:
- WYSIWYG from the paper feed (repo / raw)
- Assembly only (with and without line numbers)
- ...and more view slices discussed in the repo
Armed with accessible source, we're ready to start the journey to appreciate what Gates called "the coolest code I've ever written".
My Goal: To break knowledge barriers preventing modern programmmers from jumping in to the code by collecting foundational knowledge in one place (right here!)
Solid Foundations - Prerequisite knowledge for Altair BASIC
Computing in 1975 was more simple than today. It was possible for a young person to grasp the entire computation chain: from hardware and machine instructions through the single running program to the display / teleprinter. Much like three talented 20-somethings that shoehorned this BASIC interpreter in to 4K of memory. Simple ... but obscure by today's standards.
We'll walk through the tools of the era roughly in order of importance. These tools are essential to understand exactly what we're seeing in the source code without constant cross-referencing to ask "What is this?". As we move down the list, the resources will address the broad implications of "Why" rather than "What or How".
The Intel 8080 CPU
Key Resource:The 8080 Assembly Language Programming Manual (1975)
All assembly for Altair BASIC targets the Intel 8080. It's on the programmer to bootstrap this chip to support their application. If you're coming from an x86 background (like me), then you have to retrain your thinking since this processcor predates the 8086 by four years. The architecture at the time was known as MCS-80, and had several quirks were eventually ironed out. The most important points for our purposes are:
The Registers
General purpose registers are 8-bits wide. The register names do not follow later conventions leading to software compatibility complications with later x86 CPUs. Fortunately, the convention isn't too far from what the 8080 provides.
Register Pairs
The 8080 implements 'register pairs' for performing 16-bit computations using two 8-bit registers. This idea falls away with the more generalized x86 architecture after 1978. The key point for the Altair BASIC source is to know the pairs, how to reference them, and the 8 specific instructions involving pairs. In general, instructions reference the given register unless it is a special pair instruction.
For example, the INX B instruction will increment the combined B,C value by one. In most cases this means 'INX B' will actually only increment the value in C unless it is all 1s, which carries in to B. The special instructions to watch for are:
*The 1978 reference didn't exist when Altair BASIC was developed, but it is higher quality than the 1973 manual
Instruction | Valid Regs | Purpose |
---|---|---|
PUSH | B,D,H,PSW | Adds the two-byte contents of the register pair to the stack |
POP | B,D,H,PSW | Fills the register pair with two bytes from the stack |
DAD | B,D,H,SP | 16-bit add of the register pair and H,L contents. DAD H is equivalent to left shift of H,L |
INX | B,D,H,SP | 16-bit increment of the target register pair |
DCX | B,D,H,SP | 16-bit decrement of the target register pair |
XCHG | None | Swaps H,L with D,E |
XTHL | None | Swaps H,L with *SP+1,*SP respectively |
SPHL | None | Moves the contents of H,L in to SP directly (no memory changes) |
Interrupt Vectors
The first 64 words in memory (0o0000-0o0100) are dedicated to handling interrupts. External devices invokve interrupts by supplying an RST instruction with a handler address. The processor expects the application programmer to lay out eight routines of 8 bytes each from at the start of memory. If the application does not use hardware interrupts then should be disabled on initialization with a 'DI' instruction. Refer to Chapter 5, Page 60 of the 8080 guide for the details.
Altair BASIC disables all hardware interrupts and instead uses the handlers to dispatch common functions. The CPU still responds to RST invoked from within the application, but will not listen for hardware RSTs. Discussion about the Altair BASIC handlers continues below
The MACRO-10 Assembler
Key Resource: The MACRO Assembler Reference Manual (1973 and 1978*)
Altair BASIC was written on Harvard's PDP-10 and tested on an 8080 emulator written by Paul Allen. Consequently, Gates and Allen used the MACRO-10 assembler and its extended features. You'll need to recognize these features in order to quickly read through the code.
Pseudo-Ops
Instructions provided by MACRO-10 range from crafting specific bytes and sequences to merely visual styling of the source listing when printed. For Altair BASIC, know these operations:
Pseudo-Op | Count | Purpose |
---|---|---|
BLOCK | 49 | Reserves memory space. No code generated. Next instruction address skips by BLOCK size |
END | 2 | Mandatory last statement in a MACRO program |
EXTERNAL | 16 | Reserves symbols for resolution at load time. Cannot be defined in current source |
IFE | 205 | Conditional branching assembly - see below for further discussion |
IFN | 351 | Conditional branching assembly - see below for further discussion |
INTERNAL | 27 | Tags symbol as global to all other loaded programs. Must be defined in this source |
LIST | 2 | Resumes listing the program following an XLIST statement |
PAGE | 48 | Ends the logical page of source. Only relevant for source display |
PRINTX | 21 | Outputs text during assembly |
RADIX | 3 | Sets the radix (base) in the program and is applied to all numbers |
RELOC | 8 | Directly sets the address of the next (and subsequent) instructions |
SALL | 2 | Suppresses all macros and expansions. Helpful when listing sources |
SEARCH | 2 | Declares a source for MACRO-10 to search for unresolved symbols |
SUBTTL | 50 | Sets the subtitle see at the top of source listing pages |
TITLE | 2 | Sets the title of the program |
XLIST | 2 | Suspends output of the program listing during Pass 2 |
XWD | 53 | Generates code specified. Useful for crafting exact instructions. See below |
Conditional Assembly
IFE and IFN evaluate expressions follow the branch that evaluates to 0 or not 0 respectively. The source makes extensive use of IFE and IFN pseudo-ops to generate only the code necessary for the build target. The target is chosen with pre-configured with constant values in a 'Common File' prior to assembly. The assembler determines which code paths are dead and only generates output on the desired path based on evaluation of the given expression. The source includes all possible code paths, but only one specific path is built based on the configuration.
Here is a sneak peak of the 'Common File' from the first page of source showing IFE in action. We will discuss how to read this properly in coming sections. Bolded comments are mine:
BASIC MCS 8080 GATES/ALLEN/DAVIDOFF MACRO 47(113) 03:12 10-SEP-75 PAGE 1 C 6-SEP-64 03:11 COMMON FILE 1 00100 SEARCH MCS808 2 00200 SUBTTL COMMON FILE 3 00300 SALL 4 000002 00400 LENGTH==2 /* LENGTH of 2 implies 12K (Extended) BASIC build target */ 5 000001 00500 REALIO==1 /* REALIO of 1 means real machine, not simulator */ 6 000000 00600 CASSW==0 /* The 000000 to the left shows generated constant */ 7 000000 00700 PURE==0 8 000000 00800 LPTSW==0 9 000000 00900 DSKFUN==0 10 000000 01000 CONSSW==0 11 12 000016 01200 CLMWID==^D14 13 000000 01300 RAMBOT=^O20000 14 000001 01400 CONTRW==1 15 01500 IFE REALIO,< /* IFE expects a 0 but gets a 1 -- branch not taken */ 16 /* Nothing generated 01600 LPTSW==0 17 * 01700 CASSW==0 18 * 01800 CONSSW==0 19 * 01900 DSKFUN==0 20 * 02000 CONTRW==0> 21 * 22 * 02200 IFE LENGTH,< /* IFE expects a 0 but gets a 2 -- branch not taken */ 23 * 02300 EXTFNC==0 /* These are the settings for 4K BASIC */ 24 * 02400 MULDIM==0 25 * 02500 STRING==0 26 * 02600 CASSW==0 27 * 02700 LPTSW==0 28 * 02800 DSKFUN==0 29 * 02900 CONSSW==0 30 * 03000 CONTRW==0> 31 * 32 * 03200 IFE LENGTH-1,< /* IFE expects a 0 but gets a 1 -- branch not taken */ 33 * 03300 EXTFUN==1 /* These are the settings for 8K BASIC */ 34 * 03400 MULDIM==1 35 */ 03500 STRING==1> 36 37 03700 IFE LENGTH-2,< /* IFE gets a 0 -- branch taken and constants output */ 38 000001 03800 EXTFUN==1 /* These are the settings for Extended BASIC */ 39 000001 03900 MULDIM==1 40 000001 04000 STRING==1>
Other Small Details
- OCTAL format is used for nearly all addresses throughout the source. Couting is 1-7, 10-17, 77 + 1 = 100, etc.
- Symbols have a maximum of 6 characters. This includes jump lables and variables
- Some symbols and functions are not resolved with the source, such as ADR and DC. These are likely in MCS808 shown on line 1
- Many pseudo-ops are dedicated to clean code display and not code generation, like TITLE, PAGE, SALL, etc
- The XWD pseudo-op is used in unexpected ways to shorten code generation. More on that later.
The basics of BASIC
Key Resource: MITS BASIC Manual (1975)
The design and operation of BASIC interpreters were well-known in 1975. Dartmouth BASIC had been widely shared for over 10 years and universities included interpreter implementation in computer science curricula. This allowed Gates and Allen to focus on the immediate challenge of implementing BASIC on the resource-constrained Altair.
For us, it's helpful to understand what a generic BASIC intepreter does beforehand so we can focus on the Altair implementation and appreciate the optimizations made it feasible on limited hardware. The BASIC manual linked above tells us how the software should behave and provides many hints about the implementation.
A simple view of the user runtime experience:
Even in 1975, the interactions and outputs don't feel much different than a modern Python interpreter. Users input single immediate (direct) commands or a long program (indirect) commands, and expect a meaningful response. An interpreter facilitates this experience by reading input, checking for validity, evaluating responses, providing output, and preparing for the next commands. Altair BASIC compresses this process as much as possible in order to operate with minimum memory space
Under the hood, the Altair BASIC solution looks like this:
The Altair 8800
Key Resource: Altair 8800 Operator's Manual
Optional Resource: Altair 8800 Theory of Operations and Schematics
Ironically, the Altair itself is last on the list of things to know for the BASIC source code. We can talk all day about the box itself -- it's the entire reason we're here! But for our purposes, BASIC talks to the 8080 while it lives in the memory supplied by the Altair. The challenges of manipulating the front panel and configuring peripherals (teletypes, tapereaders, etc) are assumed to be overcome by the time BASIC loads. Considering that the mythology says Gates and Allen didn't even have a physical Altair to build and test BASIC on, we can also get by without digging in too far. Feel free to dig in to the manuals, but the most important points for us are:
- Three memory configurations: 4K, 8K, and Extended (12K or more)
- Altair uses a two-stage loading process to start BASIC (Details on page 46 of the BASIC manual)
- First stage loader begins to read from an external source (tape, cassette, etc)
- Second stage loader verifies that BASIC was loaded correctly and begins execution
- This was the loader that legends say Allen wrote while the plane landed for the product demo
- Altair begins executing at memory location 0, which is where we begin the journey into the BASIC source
The Source Code
Source Layout
Idioms
Analysis in progress...
FAQ
More questions added as they roll in