Decoded: Altair BASIC

May 2025 -- work in progress (~25%)

Microsoft logo at the time Altair BASIC was released

Last month, Bill Gates celebrated the 50th anniversary of Microsoft by releasing its earliest known source code: the Altair BASIC interpreter 3.0. The significance of Microsoft's first endeavor has been discussed ad nauseum by industry and insiders for decades - to which, I have nothing to add.

As for the source code, I hope to help future code readers enjoy exploring minds from an earlier era. The first hurdle? Converting the 314 pages of fanfold impact printouts into something easier to analyze. Here you go:

Altair BASIC 3.0 source on GitHub:

  • WYSIWYG from the paper feed (repo / raw)
  • Assembly only (with and without line numbers)
  • ...and more view slices discussed in the repo

Armed with accessible source, we're ready to start the journey to appreciate what Gates called "the coolest code I've ever written".

My Goal: To break knowledge barriers preventing modern programmmers from jumping in to the code by collecting foundational knowledge in one place (right here!)


Solid Foundations - Prerequisite knowledge for Altair BASIC

Computing in 1975 was more simple than today. It was possible for a young person to grasp the entire computation chain: from hardware and machine instructions through the single running program to the display / teleprinter. Much like three talented 20-somethings that shoehorned this BASIC interpreter in to 4K of memory. Simple ... but obscure by today's standards.

We'll walk through the tools of the era roughly in order of importance. These tools are essential to understand exactly what we're seeing in the source code without constant cross-referencing to ask "What is this?". As we move down the list, the resources will address the broad implications of "Why" rather than "What or How".

The Intel 8080 CPU

Key Resource:The 8080 Assembly Language Programming Manual (1975)

All assembly for Altair BASIC targets the Intel 8080. It's on the programmer to bootstrap this chip to support their application. If you're coming from an x86 background (like me), then you have to retrain your thinking since this processcor predates the 8086 by four years. The architecture at the time was known as MCS-80, and had several quirks were eventually ironed out. The most important points for our purposes are:

The Registers

Registers of the Intel 8080 CPU

General purpose registers are 8-bits wide. The register names do not follow later conventions leading to software compatibility complications with later x86 CPUs. Fortunately, the convention isn't too far from what the 8080 provides.

Register Pairs
The 8080 implements 'register pairs' for performing 16-bit computations using two 8-bit registers. This idea falls away with the more generalized x86 architecture after 1978. The key point for the Altair BASIC source is to know the pairs, how to reference them, and the 8 specific instructions involving pairs. In general, instructions reference the given register unless it is a special pair instruction.

For example, the INX B instruction will increment the combined B,C value by one. In most cases this means 'INX B' will actually only increment the value in C unless it is all 1s, which carries in to B. The special instructions to watch for are:

*The 1978 reference didn't exist when Altair BASIC was developed, but it is higher quality than the 1973 manual

Instruction Valid Regs Purpose
PUSHB,D,H,PSWAdds the two-byte contents of the register pair to the stack
POPB,D,H,PSWFills the register pair with two bytes from the stack
DADB,D,H,SP16-bit add of the register pair and H,L contents. DAD H is equivalent to left shift of H,L
INXB,D,H,SP16-bit increment of the target register pair
DCXB,D,H,SP16-bit decrement of the target register pair
XCHGNoneSwaps H,L with D,E
XTHLNoneSwaps H,L with *SP+1,*SP respectively
SPHLNoneMoves the contents of H,L in to SP directly (no memory changes)

Interrupt Vectors
The first 64 words in memory (0o0000-0o0100) are dedicated to handling interrupts. External devices invokve interrupts by supplying an RST instruction with a handler address. The processor expects the application programmer to lay out eight routines of 8 bytes each from at the start of memory. If the application does not use hardware interrupts then should be disabled on initialization with a 'DI' instruction. Refer to Chapter 5, Page 60 of the 8080 guide for the details.

Altair BASIC disables all hardware interrupts and instead uses the handlers to dispatch common functions. The CPU still responds to RST invoked from within the application, but will not listen for hardware RSTs. Discussion about the Altair BASIC handlers continues below

The MACRO-10 Assembler

Key Resource: The MACRO Assembler Reference Manual (1973 and 1978*)

Altair BASIC was written on Harvard's PDP-10 and tested on an 8080 emulator written by Paul Allen. Consequently, Gates and Allen used the MACRO-10 assembler and its extended features. You'll need to recognize these features in order to quickly read through the code.

Pseudo-Ops
Instructions provided by MACRO-10 range from crafting specific bytes and sequences to merely visual styling of the source listing when printed. For Altair BASIC, know these operations:

Pseudo-Op Count Purpose
BLOCK49Reserves memory space. No code generated. Next instruction address skips by BLOCK size
END2Mandatory last statement in a MACRO program
EXTERNAL16Reserves symbols for resolution at load time. Cannot be defined in current source
IFE205Conditional branching assembly - see below for further discussion
IFN351Conditional branching assembly - see below for further discussion
INTERNAL27Tags symbol as global to all other loaded programs. Must be defined in this source
LIST2Resumes listing the program following an XLIST statement
PAGE48Ends the logical page of source. Only relevant for source display
PRINTX21Outputs text during assembly
RADIX3Sets the radix (base) in the program and is applied to all numbers
RELOC8Directly sets the address of the next (and subsequent) instructions
SALL2Suppresses all macros and expansions. Helpful when listing sources
SEARCH2Declares a source for MACRO-10 to search for unresolved symbols
SUBTTL50Sets the subtitle see at the top of source listing pages
TITLE2Sets the title of the program
XLIST2Suspends output of the program listing during Pass 2
XWD53Generates code specified. Useful for crafting exact instructions. See below

Conditional Assembly
IFE and IFN evaluate expressions follow the branch that evaluates to 0 or not 0 respectively. The source makes extensive use of IFE and IFN pseudo-ops to generate only the code necessary for the build target. The target is chosen with pre-configured with constant values in a 'Common File' prior to assembly. The assembler determines which code paths are dead and only generates output on the desired path based on evaluation of the given expression. The source includes all possible code paths, but only one specific path is built based on the configuration.

Here is a sneak peak of the 'Common File' from the first page of source showing IFE in action. We will discuss how to read this properly in coming sections. Bolded comments are mine:

BASIC MCS 8080  GATES/ALLEN/DAVIDOFF    MACRO 47(113) 03:12 10-SEP-75 PAGE 1
C        6-SEP-64 03:11         COMMON FILE

     1                                  00100   SEARCH  MCS808
     2                                  00200   SUBTTL  COMMON FILE
     3                                  00300   SALL
     4                  000002          00400   LENGTH==2            /* LENGTH of 2 implies 12K (Extended) BASIC build target */
     5                  000001          00500   REALIO==1            /* REALIO of 1 means real machine, not simulator */
     6                  000000          00600   CASSW==0             /* The 000000 to the left shows generated constant */
     7                  000000          00700   PURE==0
     8                  000000          00800   LPTSW==0
     9                  000000          00900   DSKFUN==0
    10                  000000          01000   CONSSW==0
    11
    12                  000016          01200   CLMWID==^D14
    13                  000000          01300   RAMBOT=^O20000
    14                  000001          01400   CONTRW==1
    15                                  01500   IFE     REALIO,<     /* IFE expects a 0 but gets a 1 -- branch not taken */
    16    /* Nothing generated          01600           LPTSW==0
    17     *                            01700           CASSW==0
    18     *                            01800           CONSSW==0
    19     *                            01900           DSKFUN==0
    20     *                            02000           CONTRW==0>
    21     *
    22     *                            02200   IFE     LENGTH,<     /* IFE expects a 0 but gets a 2 -- branch not taken */
    23     *                            02300           EXTFNC==0    /* These are the settings for 4K BASIC */
    24     *                            02400           MULDIM==0
    25     *                            02500           STRING==0
    26     *                            02600           CASSW==0
    27     *                            02700           LPTSW==0
    28     *                            02800           DSKFUN==0
    29     *                            02900           CONSSW==0
    30     *                            03000           CONTRW==0>
    31     *
    32     *                            03200   IFE     LENGTH-1,<   /* IFE expects a 0 but gets a 1 -- branch not taken */
    33     *                            03300           EXTFUN==1    /* These are the settings for 8K BASIC */
    34     *                            03400           MULDIM==1
    35     */                           03500           STRING==1>
    36    
    37                                  03700   IFE     LENGTH-2,<   /* IFE gets a 0 -- branch taken and constants output */
    38                  000001          03800           EXTFUN==1    /* These are the settings for Extended BASIC */
    39                  000001          03900           MULDIM==1
    40                  000001          04000           STRING==1>

Other Small Details

  • OCTAL format is used for nearly all addresses throughout the source. Couting is 1-7, 10-17, 77 + 1 = 100, etc.
  • Symbols have a maximum of 6 characters. This includes jump lables and variables
  • Some symbols and functions are not resolved with the source, such as ADR and DC. These are likely in MCS808 shown on line 1
  • Many pseudo-ops are dedicated to clean code display and not code generation, like TITLE, PAGE, SALL, etc
  • The XWD pseudo-op is used in unexpected ways to shorten code generation. More on that later.

The basics of BASIC

Key Resource: MITS BASIC Manual (1975)

The design and operation of BASIC interpreters were well-known in 1975. Dartmouth BASIC had been widely shared for over 10 years and universities included interpreter implementation in computer science curricula. This allowed Gates and Allen to focus on the immediate challenge of implementing BASIC on the resource-constrained Altair.

For us, it's helpful to understand what a generic BASIC intepreter does beforehand so we can focus on the Altair implementation and appreciate the optimizations made it feasible on limited hardware. The BASIC manual linked above tells us how the software should behave and provides many hints about the implementation.

A simple view of the user runtime experience:

The user view of the Altair BASIC interpreter

Even in 1975, the interactions and outputs don't feel much different than a modern Python interpreter. Users input single immediate (direct) commands or a long program (indirect) commands, and expect a meaningful response. An interpreter facilitates this experience by reading input, checking for validity, evaluating responses, providing output, and preparing for the next commands. Altair BASIC compresses this process as much as possible in order to operate with minimum memory space

Under the hood, the Altair BASIC solution looks like this:

The Altair 8800

Key Resource: Altair 8800 Operator's Manual
Optional Resource: Altair 8800 Theory of Operations and Schematics

Ironically, the Altair itself is last on the list of things to know for the BASIC source code. We can talk all day about the box itself -- it's the entire reason we're here! But for our purposes, BASIC talks to the 8080 while it lives in the memory supplied by the Altair. The challenges of manipulating the front panel and configuring peripherals (teletypes, tapereaders, etc) are assumed to be overcome by the time BASIC loads. Considering that the mythology says Gates and Allen didn't even have a physical Altair to build and test BASIC on, we can also get by without digging in too far. Feel free to dig in to the manuals, but the most important points for us are:

  • Three memory configurations: 4K, 8K, and Extended (12K or more)
  • Altair uses a two-stage loading process to start BASIC (Details on page 46 of the BASIC manual)
    • First stage loader begins to read from an external source (tape, cassette, etc)
    • Second stage loader verifies that BASIC was loaded correctly and begins execution
      • This was the loader that legends say Allen wrote while the plane landed for the product demo
  • Altair begins executing at memory location 0, which is where we begin the journey into the BASIC source


The Source Code

Source Layout

Idioms


Analysis in progress...

FAQ

More questions added as they roll in