Decoded: Altair BASIC

May 2025 (First Draft) - October 2025 (Final)

Microsoft logo at the time Altair BASIC was released

Last month, Bill Gates celebrated the 50th anniversary of Microsoft by releasing its earliest known source code: the Altair BASIC interpreter 3.0. The significance of Microsoft's first endeavor has been discussed ad nauseum by industry and insiders for decades - to which, I have nothing to add.

As for the source code, I hope to help future code readers enjoy exploring minds from an earlier era. The first hurdle? Converting the 314 pages of fanfold impact printouts into something easier to analyze. Here you go:

Altair BASIC 3.0 source on GitHub:

  • WYSIWYG from the paper feed (repo / raw)
  • Assembly only (with and without line numbers)
  • ...and more view slices discussed in the repo

Armed with accessible source, we're ready to start the journey to appreciate what Gates called "the coolest code I've ever written".

My Goal: To break knowledge barriers preventing modern programmmers from jumping in to the code by collecting foundational knowledge in one place (right here!)


Solid Foundations - Prerequisite knowledge for Altair BASIC

Computing in 1975 was more simple than today. It was possible for a young person to grasp the entire computation chain: from hardware and machine instructions through the single running program to the display / teleprinter. Much like three talented 20-somethings that shoehorned this BASIC interpreter in to 4K of memory. Simple ... but obscure by today's standards.

We'll walk through the tools of the era in rough order of importance. These tools are essential to understand exactly what we're seeing in the source code without constant cross-referencing to ask "What is this?". As we move down the list, the resources will address the broad implications of "Why" rather than "What or How".

The Intel 8080 CPU

Key Resource:The 8080 Assembly Language Programming Manual (1975)

All assembly for Altair BASIC targets the Intel 8080. It's on the programmer to bootstrap this chip to support their application. If you're coming from an x86 background (like me), then you have to retrain your thinking since this processcor predates the 8086 by four years. The architecture at the time was known as MCS-80, and had several quirks were eventually ironed out. The most important points for our purposes are:

The Registers

Registers of the Intel 8080 CPU

General purpose registers are 8-bits wide. The register names do not follow later conventions leading to software compatibility complications with later x86 CPUs. Fortunately, the convention isn't too far from what the 8080 provides.

Register Pairs
The 8080 implements 'register pairs' for performing 16-bit computations using two 8-bit registers. This idea falls away with the more generalized x86 architecture after 1978. The key point for the Altair BASIC source is to know the pairs, how to reference them, and the 8 specific instructions involving pairs. In general, instructions reference the given register unless it is a special pair instruction.

For example, the INX B instruction will increment the combined B,C value by one. In most cases this means 'INX B' will actually only increment the value in C unless it is all 1s, which carries in to B. The special instructions to watch for are:

Instruction Valid Regs Purpose
PUSHB,D,H,PSWAdds the two-byte contents of the register pair to the stack
POPB,D,H,PSWFills the register pair with two bytes from the stack
DADB,D,H,SP16-bit add of the register pair and H,L contents. DAD H is equivalent to left shift of H,L
INXB,D,H,SP16-bit increment of the target register pair
DCXB,D,H,SP16-bit decrement of the target register pair
XCHGNoneSwaps H,L with D,E
XTHLNoneSwaps H,L with *SP+1,*SP respectively
SPHLNoneMoves the contents of H,L in to SP directly (no memory changes)

Interrupt Vectors
The first 64 words in memory (0o0000-0o0100) are dedicated to handling interrupts. External devices invokve interrupts by supplying an RST instruction with a handler address. The processor expects the application programmer to lay out eight routines of 8 bytes each from at the start of memory. If the application does not use hardware interrupts then should be disabled on initialization with a 'DI' instruction. Refer to Chapter 5, Page 60 of the 8080 guide for the details.

Altair BASIC disables all hardware interrupts and instead uses the handlers to dispatch common functions. The CPU still responds to RST invoked from within the application, but will not listen for hardware RSTs. Discussion about the Altair BASIC handlers continues below

The MACRO-10 Assembler

Key Resource: The MACRO Assembler Reference Manual (1973 and 1978*)

Altair BASIC was written on Harvard's PDP-10 and tested on an 8080 emulator written by Paul Allen. Consequently, Gates and Allen used the MACRO-10 assembler and its extended features. You'll need to recognize these features in order to quickly read through the code.

*The 1978 reference didn't exist when Altair BASIC was developed, but I find the quality easier to read than the earlier manual.

Pseudo-Ops
Instructions provided by MACRO-10 range from crafting specific bytes and sequences to merely visual styling of the source listing when printed. For Altair BASIC, know these operations:

Pseudo-Op Count Purpose
BLOCK49Reserves memory space. No code generated. Next instruction address skips by BLOCK size
DEFINE29Creates a symbol as a macro of other symbols. Similar to #define in C
END2Mandatory last statement in a MACRO program
EXTERNAL16Reserves symbols for resolution at load time. Cannot be defined in current source
IFE205Conditional branching assembly - see below for further discussion
IFN351Conditional branching assembly - see below for further discussion
INTERNAL27Tags symbol as global to all other loaded programs. Must be defined in this source
LIST2Resumes listing the program following an XLIST statement
PAGE48Ends the logical page of source. Only relevant for source display
PRINTX21Outputs text during assembly
RADIX3Sets the radix (base) in the program and is applied to all numbers
RELOC8Directly sets the address of the next (and subsequent) instructions
SALL2Suppresses all macros and expansions. Helpful when listing sources
SEARCH2Declares a source for MACRO-10 to search for unresolved symbols
SUBTTL50Sets the subtitle see at the top of source listing pages
TITLE2Sets the title of the program
XLIST2Suspends output of the program listing during Pass 2
XWD53Generates code specified. Useful for crafting exact instructions. See below

Conditional Assembly
IFE and IFN evaluate expressions follow the branch that evaluates to 0 or not 0 respectively. The source makes extensive use of IFE and IFN pseudo-ops to generate only the code necessary for the build target. The target is chosen with pre-configured with constant values in a 'Common File' prior to assembly. The assembler determines which code paths are dead and only generates output on the desired path based on evaluation of the given expression. The source includes all possible code paths, but only one specific path is built based on the configuration.

Here is a sneak peak of the 'Common File' from the first page of source showing IFE in action. We will discuss how to read this properly in coming sections. Bolded comments are mine:

BASIC MCS 8080  GATES/ALLEN/DAVIDOFF    MACRO 47(113) 03:12 10-SEP-75 PAGE 1
C        6-SEP-64 03:11         COMMON FILE

     1                                  00100   SEARCH  MCS808
     2                                  00200   SUBTTL  COMMON FILE
     3                                  00300   SALL
     4                  000002          00400   LENGTH==2            /* LENGTH of 2 implies 12K (Extended) BASIC build target */
     5                  000001          00500   REALIO==1            /* REALIO of 1 means real machine, not simulator */
     6                  000000          00600   CASSW==0             /* The 000000 to the left shows generated constant */
     7                  000000          00700   PURE==0
     8                  000000          00800   LPTSW==0
     9                  000000          00900   DSKFUN==0
    10                  000000          01000   CONSSW==0
    11
    12                  000016          01200   CLMWID==^D14
    13                  000000          01300   RAMBOT=^O20000
    14                  000001          01400   CONTRW==1
    15                                  01500   IFE     REALIO,<     /* IFE expects a 0 but gets a 1 -- branch not taken */
    16    /* Nothing generated          01600           LPTSW==0
    17     *                            01700           CASSW==0
    18     *                            01800           CONSSW==0
    19     *                            01900           DSKFUN==0
    20     *                            02000           CONTRW==0>
    21     *
    22     *                            02200   IFE     LENGTH,<     /* IFE expects a 0 but gets a 2 -- branch not taken */
    23     *                            02300           EXTFNC==0    /* These are the settings for 4K BASIC */
    24     *                            02400           MULDIM==0
    25     *                            02500           STRING==0
    26     *                            02600           CASSW==0
    27     *                            02700           LPTSW==0
    28     *                            02800           DSKFUN==0
    29     *                            02900           CONSSW==0
    30     *                            03000           CONTRW==0>
    31     *
    32     *                            03200   IFE     LENGTH-1,<   /* IFE expects a 0 but gets a 1 -- branch not taken */
    33     *                            03300           EXTFUN==1    /* These are the settings for 8K BASIC */
    34     *                            03400           MULDIM==1
    35     */                           03500           STRING==1>
    36    
    37                                  03700   IFE     LENGTH-2,<   /* IFE gets a 0 -- branch taken and constants output */
    38                  000001          03800           EXTFUN==1    /* These are the settings for Extended BASIC */
    39                  000001          03900           MULDIM==1
    40                  000001          04000           STRING==1>

Other Small Details

  • OCTAL format is used for nearly all addresses throughout the source. Couting is 1-7, 10-17, 77 + 1 = 100, etc.
  • Symbols have a maximum of 6 characters. This includes jump lables and variables
  • Some symbols and functions are not resolved with the source, such as ADR and DC. These are likely in MCS808 shown on line 1
  • Many pseudo-ops are dedicated to clean code display and not code generation, like TITLE, PAGE, SALL, etc
  • The XWD pseudo-op is used in unexpected ways to shorten code generation. More on that later.

The basics of BASIC

Key Resource: MITS BASIC Manual (1975)

The design and operation of BASIC interpreters were well-known in 1975. Dartmouth BASIC had been widely shared for over 10 years and universities included interpreter implementation in computer science curricula. This allowed Gates and Allen to focus on the immediate challenge of implementing BASIC on the resource-constrained Altair.

For us, it's helpful to understand what a generic BASIC intepreter does beforehand so we can focus on the Altair implementation and appreciate the optimizations made it feasible on limited hardware. The BASIC manual linked above tells us how the software should behave and provides many hints about the implementation.

The user view of the runtime experience:

The user view of the Altair BASIC interpreter

Even in 1975, interactions and outputs feel similar to a modern Python interpreter. Users input single immediate (direct) command, such as PRINT 2+2, and expect an immediate response. Alternatively, a user may enter a long program of (indirect) commands, and the computer will proceed accordingly with potentially interactive outputs. An interpreter facilitates this experience by reading input, checking for validity, evaluating requests, providing output, and preparing for the next commands.

Altair BASIC compresses this process as much as possible in order to operate with minimum memory space. There isn't enough resources to support generalized semantic analysis as the output will happen in a single pass. Under the hood, the solution looks like this:

The Altair BASIC interpreter runtime loop

The names in [square brackets] are the symbol names of the supporting data structures used in Altair BASIC. We will delve in to the data structures during the loading discussion and step through this loop during the execution discussion.

The Altair 8800

Key Resource: Altair 8800 Operator's Manual
Optional Resource: Altair 8800 Theory of Operations and Schematics

Ironically, the Altair itself is last on the list of things to know for the BASIC source code. We can talk all day about the box itself -- it's the entire reason we're here! But for our purposes, BASIC talks to the 8080 while it lives in the memory supplied by the Altair. The challenges of manipulating the front panel and configuring peripherals (teletypes, tapereaders, etc) are assumed to be overcome by the time BASIC loads. Considering that the mythology says Gates and Allen didn't even have a physical Altair to build and test BASIC on, we can also get by without digging in too far. Feel free to dig in to the manuals, but the most important points for us are:

  • Three memory configurations: 4K, 8K, and Extended (12K or more)
  • Altair uses a two-stage loading process to start BASIC (Details on page 46 of the BASIC manual)
    • First stage loader begins to read from an external source (tape, cassette, etc)
    • Second stage loader verifies that BASIC was loaded correctly and begins execution
      • This was the loader that legends say Allen wrote while the plane landed for the product demo
  • Altair begins executing at memory location 0, which is where we begin the journey into the BASIC source


The Source Code

Here are facts about the source released by Bill Gates in April 2025:

  • Source version states that it is Altair BASIC 3.0
  • 314 pages of fanfold paper scanned in to 157 pages of PDF (2 physical pages per PDF page)
  • One physical page is 11" x 14" 7/8
  • 66 lines per page (6 LPI) with 132 characters per line. From the top:
    • 2 unused lines
    • 2 header lines
    • 1 blank line
    • 53 source content lines
    • 8 blank footer lines
  • Source is two files combined (F3 and F4) which includes the common (config) file for each prepended
    • F3 printout is dated 10 September 1975
    • F4 printout is dated 27 August 1975
  • 12,628 lines of source, including comment lines and whitespace
    • F3 contains 7801 lines of source including its common file.
    • F4 contains 4827 lines of source including its common file.
  • F3 includes the core BASIC intepreter
  • F4 contains the math package and the system initalization code
    • System initialization code is the last section loaded, runs once, and the space is recovered as program memory area
  • Symbol tables from the assemblers first-pass(?) are appended to the source listing after both F3 and F4

Layout of the source release

The source printouts contain several sections of note, which are duplicated between source files F3 and F4.

The organization of the printed source release of Altair BASIC 3.0F3/F4 Headers - Identifies the subsequent file along with process, print job, and system information
Common File - An 'included' configuration file with static variables for assembler usage. Note that F3 and F4 are using different Common File configurations (12K and 8K respectively).
F3/F4 Source - The program! F3 is BASIC, F4 is Math package and system initialization
F3/F4 Trailer - Verification of print job process complete
Duplicated Pages - Extra pages of F3 snuck in to F4. Physically the same page, but printed and scanned separately
F3/F4 Symbol Table - The list of symbols known to the respective source file. Includes a value,reference and a possible flag such as ext or spd. See the 1978 MACRO book page 6-4 for flag details.
F3/F4 Cross-Reference Symbol Table - The CR Symbol Table shows a list of line numbers for each symbols. Line number is the assembler assigned number on the far left of a page, not programmer provided number. See section 2 below.

Source subsections

The F3 and F4 source files group their routines in to subsections, each titled conveniently at the top of each page. The sections are:

F3 Sections (BASIC Interpreter) F4 Sections (Math Package)
Common FileCommon File
Version 3.0Math Package Configuration
Some ExplanationFP Addition and Subtraction
RST RoutinesNatural Log Function
Dispatch Tables,Reserved Words, Error TextFP Multiplication and Division
Low Segment - RAMSign, SGN, Float, Neg, ABS
Text ContantsFP Movement Routines
General Storage ManagementCompare Two Numbers
Error Handler, Ready, Compacting, New, Clear, MainInteger, Single, and Double Conversion
The LIST CommandGreatest Integer Function
FOR StatementInteger Arithmetic Routines
New Statement FetcherDouble Precision Arithmetic
Restore, Stop, End, LINGET, CHRCONFP Input Routines
Run, Goto, Gosub, ReturnFP Output Routines
LetExponentiation and Square Root
On GotoExponential Function
If ... ThenPolynomial Eval and Random Numbers
PrintSine, Cosine, and Tangent
Input and ReadArctangent
NextSystem Initialization
Formula Evaluation
Dimension and Variable Searching
Multiple Dimension
Function and Integer to Float Routines
Simple User-Defined-Functions
String Functions
Fancy List, Delete, Edit, LLIST
Disk Code
CLOAD, CSAVE, Console
Peek and Poke

Anatomy of a source page

The source listing from the MACRO assembler includes both the source and generated machine code for the specific platform configured in the common file. I will predominantly focus on the source code on the right hand side of the page. People interested in direct platform code might be interested in following the left-hand side, which is helpful as a check for assembler operation.

Here is an annotated view of a typical page of the source:

Anatomy of the Altair BASIC source listing

Section 2 line numbers are relevant for the source listing only. Sections 3,4 and 5 are the generated output of the assembler. Sections 6 and 7 are the input assembly code

Section   Purpose
1The two-line header at the top of every source page. Important info includes the page, file, and section subtitle
2The listing line number. This is only relevant for the printout and isn't a part of the source
3The address of the subsequent two words of generated bytes. Ususally sequential unless explicitly skipped (i.e BLOCK statement)
4Upper 18-bit word of output. Not directly applicable for the 8080 which has at most 16-bits addressable. Throughout the source code, only 3 values are observed. 001000 whenever the lower byte is a instruction (op-code). 000000 when the lower byte is an operand (data). And 777777 for negative immediate values.
5Lower 18-bit word of output. Contains the relevant generated code for each opcode or operand.
6Source line number. These were explicitly written by Gates et al during development.
7The hand crafted MACRO-10 and MCS-80 assembly code and comments.

Key points:
  • Only addresses appearing in Section 3 are live code paths that appear in the output build. The address is the location in the output. This is controlled by the configuration files
  • When sections 3 and 4 are blank yet section 5 has a value: This indicates a constant value referenced by the assembly during a pass
  • Values in section 5 with a trailing apostraphe indicate a relocatable memory reference, otherwise treat it as a value (opcode or operand)

Code quirks and idioms

Combine fifty year-old code with resource-constrained hacks and you'll have a code base littered with headscratchers. I'll dissect some ideas to watch for as we walk through the source.

XWD pseudo-op used for flow control
The XWD pseudo-op provided by the MACRO-10 assembler is typically used to craft bit-level specific output without interference from the assembler. Quite literally "Insert only this value at the current address". Microsoft used XWD to change control flow by inserting instructions that force subsequent lines to be read as operands (data) rather than executing them. A few examples:

  1874  002072' 001000  000036          27940   SNERR:  MVI     E,ERRSN         ;"SYNTAX ERROR"
  1875  002073' 000000  000002
  1876  002074' 001000  000001          27960           XWD     ^O1000,1        ;"LXI B," OVER THE NEXT 2
  1877  002075' 001000  000036          27980   DV0ERR: MVI     E,ERRDV0        ;DIVISION BY ZERO
  1878  002076' 000000  000013
  1879                                  28000   IFN     LENGTH,<
  1880  002077' 001000  000001          28020           XWD     ^O1000,1        ;SKIP NEXT TWO
  1881  002100' 001000  000036          28040   NFERR:  MVI     E,ERRNF>        ;"NEXT WITHOUT FOR" ERROR
  1882  002101' 000000  000001
  1883  002102' 001000  000315          28060   ERROR:  CALL    STKINI          ;RESET THE STACK AND FLAGS

This code segment has multiple entry points on lines 1874, 1877, and 1881 that set different string references but all 'fall through' to line 1883 without side effects. The trick is that the XWD instructions place an 'LXI B' op-code only without adding the 16-bit immediate operand expected causing the MVI instruction and its operand to be loaded into register pair BC. If we entered this code block at the top from like 1874, the "syntax error" string reference is loaded in to E, then the XWD instruction is executed as LXI B, gobbling up the subsequent MVI and it's operand as data rather than executing them. This repeats on line 1880 and thus we fall through to line 1883.

So why not just use a real LXI B, or even a jump to ERROR? It's more memory expensive -- the assembler will use 3 bytes to fill out LXI and its operands or jump reference. Using XWD only sacrifices 1 byte and smashes BC while weaving 3 code paths that slide to the same exit point without consequential side effects.

Another XWD example:

  2372  002646' 001000  000361          34640   FOUND:  POP     PSW             ;TAKE OFF GARBAGE ORIG POINTER
  2373  002647' 001000  000170          34660           MOV     A,8             ;GET RESERVED WORD #
  2374  002650' 001000  000366          34680           ORI     128             ;SET MSB TO FLAG AS RESERVED WORD
  2375  002651' 000000  000200
  2376  002652' 001000  000362          34700           XWD     ^O1000,^O362    ;"JP" AROUND THE POP H AND MOV A,M
  2377  002653' 001000  000341          34720   TABEND: POP     H               ;GET BACK ORIG POINTER
  2378  002654' 001000  000176          34740           MOV     A,M             ;GET BACK ORIG CHAR
  2379  002655' 001000  000321          34760           POP     D               ;GET STUFF POINTER BACK

This produces the same effect of skipping 2 instructions without smashing BC in the process. But why use JP (Jump If Positive)? The previous instruction of ORI 0b10000000, in addition to it's BASIC use for signalling a reserved also, also guarantees that the SIGN flag will be reset (negative), thus the fake JP branch will never be taken and we will fall through, saving us 2 bytes while preserving key registers.

Therein lies the art of weaving together multiple code paths and choosing the most appropriate operation to subtitute based on how many instructions you need to skip while avoiding destructive side effects like smashing important registers. Here is a summary of XWD usage throughout the Altair BASIC source:

XWD Instruction Mimicks Desired effectSide effects
XWD ^O1000,1LXI BSkips 2 instructionsSmashes register pair BC
XWD ^O1000,^O16 MVI CSkips 1 instructionSmashes register C
XWD ^O1000,^O21 LXI DSkips 2 instructionsSmashes register pair DE
XWD ^O1000,^O76 MVI ASkips 1 instructionSmashes accumulator
XWD ^O1000,^O362 JPSkips 2 instructionsNo side effects
XWD ^O1000,^O366 ORIEnsure register A isn't zero, Z flag is reset, skips 1 instructionSmashes A, Z flag reset

CPI instruction as a fast 'type' mechanism

The source often uses the CPI instruction to make a fast determination about an input value. The most common situation is comparing a value in the accumulator for an exact value. If the values match, the ZERO flag will be 1 and the subsequent branch will usually follow a JZ, JNZ, CZ, or CNZ instruction.

In the case of user input, we see CPI often compared to "A". If the user has input a digit (0-9) in to the accumulator, the "A" charcater will assert the Carry flag, while an input letter will not. This way we can quickly determine if the user input a number or letter as expected and branch using JC, JNC, CC, or CNC.

Note that this behavior is not consistent when characters arrive in the accumulator with the RST routine CHRGET, because it directly manipulates flags during the call.

Unresolved symbols

Not every symbol seen in the source has a definition that we can reference. I believe most of these are included in the universal file suggested by SEARCH MCS808 line that unfortunately I can't find references to. It's essentially an #include or 'import' statement. I've made some educated guesses about the meaning of these symbols, but I'm open to better interpretations if anyone has better information. Until then, consider these:

  • $CODE - Always resolves to 0 (zero) in this assembler pass. Likely the offset to the first address of code (code segment in x86 lingo) to generalize the load address.
  • ADR(x) - Reserves the current two-bytes as an address to the given routine. Wrapped in a local ADRP definition to include a precedence value for the operator table.
  • DC - Define Constant. Used to allocate static string data to the current space in memory. Similar to the DB directive in x86. Altair BASIC wraps this with DCI, DCE, and DCL helpers early in the assembly.
  • VMOVE- Used to copy descriptors in the 12K version. This external routine prefers 3-byte descriptors vs the 4-byte internal design used by the MOVE routine in 4K and 8K.

Number base ambiguity

By default, all addresses are read in octal format (base 8). All immediate values in operands are decimal (base 10). The following control delimiters tell the assembler how to explicitly handle the subsequent number:

  • ^O - Octal number. So ^O76 is equal to decimal 62 and binary 00111110
  • ^D - Decimal number. So ^D76 is equal to octal 114 and binary 01001100
  • ^B - Binary number. Included here for completion, but this isn't actually used in the code

You might wonder about hexadecimal? It turns out that MACRO in this era didn't have a built-in hex notation. Word sizes in the mid 1970s hadn't settled to be a multiple of 8/16, and the PDP-10 itself used 36-bit words (18-bit half-words), to which octal representation is better aligned. The PDP-8 was 12-bit words but the later PDP systems did ultimately land on 16-bit words.

The RET trampoline

The source code makes some use of return instructions not paired with a previous CALL instruction. Some address is pushed on to the stack, possibly much earlier, before the eventual RET invocation. If you see an unexplained return in the command sequence, but a jump brought you to this code block, the look to the nearest PUSH for further explanation.

In fact, be prepared for all forms of changing flow control such as calls, jumps, returns, and direct loading of the program counter with PCHL. It all happens in Altair BASIC.


Loading Altair BASIC

The memory footprint of a program was very easy to understand in the simple days of the 1970s. There is no OS, thus no virtual memory or paging system, and certainly nothing like ASLR. Altair BASIC owned the box and could directly address any physical space in memory. When the program tape read the final byte in to memory, execution is ready to begin from memory location 0.

Before running the program, we'll review how it is laid out in memory, what the key sections are, why the program is organized in this way, and what structures support execution.

The view at load time

Altair BASIC lives in memory in the same order of memory addresses from section 3 of the source pages (example shown earlier). Here is the memory map of the program:

The memory layout of Altair BASIC after load is complete

Starting from the interrupt handlers, the code drops down to fixed tables used for dispatch, then to static variable spaces. The 2nd half of the code is the runtime-execution, the largest sections being the BASIC statements and math package.

Altair BASIC follows the classic pattern of keeping single-use and optional-use code at the end so that this space can be recovered for dynamic storage. System initialization code runs once at the end, and the optional math functions follow before, allowing these final two blocks of ~730 bytes to be reclaimed at run time.

Interrupt vectors

We'll dig in to some detail of the handlers now, including the purpose and specific code lines, but save the mechanical details and the extended handlers for the execution discussion

As mentioned earlier, the 8080 CPU architecture allows the first 64 bytes of memory to be used for interrupts: eight handlers with eight bytes each. There are caveats with the Altair BASIC implementation.

  • The first instruction at memory location 0 is 'DI', which disables all hardware interrupts system-wide. Interrupts are never enabled again elsewhere in code. No external interrupts will ever invoke the handlers while Altair BASIC is running.
  • Software interrupts from within AltairBASIC will still trigger jumps to the interrupt handlers via the RST instruction and macro'd friendly name
  • AltairBASIC does not use the first or last handler and instead uses that space for initialization and user-defined functions
  • Be aware of PC and stack composition in the handlers - not all use RET directly and some jump arbitrarily
  • The discrepancy between the 64 actual bytes for handlers and the 67 bytes noted in the load time layout is due to the final unused routine overrunning its space without consequence

The interrupt handlers are dedicated to common low-level routines called often throughout the code. The RST instructions are MACRO'd to a friendly name. The eight handlers and their purposes are:

Handler # Offsets Macro NamePurpose
00-7NoneNot an interrupt handler. Jumps to init at the end of code, which self-modifies jump to "OK"
110-17SYNCHKChecks the character currently pointed at for a specific value, leading to possible syntax error
220-27CHRGETIncrements to the next character and fetches it in to accumulator
330-37OUTCHROutputs the value in the accumulator to the terminal
440-47COMPARCompares the values of register pairs DE and HL with result in FLAGS. Carry = DE > HL, Zero means equal.
550-57FSIGNChecks the sign of the floating accumulator by setting the actual accumulator to the same sign
660-67PUSHMPushes memory location at HL to the stack. Dereferneces value to BC
770-103*NoneNot a handler, but can be set by the user for their own purpose. *This handler runs longer than designed

Now for the lines of code that made up each handler location. I've removed original comments, whitespace, and machine code for clarity. I've added my own comments in bold that I believe may be relevant from the modern viewpoint, but I highly recommend reading the original comments in the source.

Interrupt 0 - Not used (memory location 0-7 octal)
13040   START:  DI            /* Disables external interrupts */
13060           JMP     INIT  /* Jumps to system init */
13080
13160   IFN     LENGTH-2,<    /* Excess space used for static 
13180           ADR(DEINT)       pointers to routines */
13240           ADR(GIVABF)>
13260
13300   IFE     LENGTH-2,<
13320           ADR(FECINT)
13340           ADR(MAKINT)>

The first line of this handler is the first line for the whole program: Disable external interrupts, which are never reenabled. The next instruction jumps out of the handler to the system initialzation routine, which modifies this jump instruction to as a jump to 'Ready', which is effectively the top of the main loop. This way if the system ever somehow jumps back to insturction 0, it can pick up normal execution.


Interrupt 1 - SYNCHK (memory location 10-17 octal)
13600           MOV     A,M     /* Get next char from M in to A */
13620           XTHL            /* Swap HL and SP, SP+1
13640                                  M now looks at char to verify
13660           CMP     M       /* If they match, Z=1 */
13680
13700           INX     H       /* Manual advance to next instruction */
13720           XTHL            /* Restore to stack for eventual RET */
13740
13760           JNZ     SNERR   /* No match - error, else fall down to CHRGET */

SYNCHK verifies that the next input character matches the character given to the handler.

A successful SYNCHK will always execute the next handler (CHRGET) because execution falls through to pull the character in to A.


Interrupt 2 - CHRGET (memory location 20-27 octal)
14060   IFE     LENGTH,<CHRGTR:>  /* In 4K, we skip a later inline optimization */
14080           INX     H         /* Advance pointer to next character */
14100           MOV     A,M       /* Pull it in to A */
14120           CPI     ":"       /* A with ":". 0-9 force carry bit*/
14200   RNC                       /* If a letter, return to get more letters */
14220   JMP     CHRCON            /* Continue CHRGET later in source */

Loads the next input character in to A before passing execution to an extended routine, which skips spaces and handles numbers.

It's worth discussing the CHRCON extended handler in this space because it has a subtle flag manipulation:

/* Only digits and control characters make it this far - letters returned already above */
43280   CHRCON: CPI     " "       /* Check for space, zero if true */
43300           JZ      CHRGTR    /* It's a zero - get another character */
43320           CPI     "0"       /* Digits will not have carry asserted at this point */
43360           CMC               /* Invert Carry - Digits will now have carry asserted */
43380           INR     A         /* Add one to reset flags */
43400           DCR     A         /* Subtract back to original so Zero means A=0 */
43420           RET

Interrupt 3 - OUTCHR (memory location 30-37 octal)
14360   OUTDO:  PUSH    PSW            /* Save FLAGS */
14380   IFN     CONTRW,<               /* If suppression feature is on...
14400           LDA     CNTWFL              Get the suppression status */
14420           ORA     A>             /* Get status of flag in FLAGS*/
14440   IFE     LENGTH!CONTRW!LPTSW,<  /* No suppression, so... 
14460           LDA     TTYPOS>             Get the output line position */
14480           JMP     OUTCON         /* Jump to extended handler */

Prints the character in A to the output terminal, preserving flags, which are restored in the extended handler discussed later in Execution.


Interrupt 4 - COMPAR (memory location 40-47 octal)
14680           MOV     A,H   /* Get high 8-bits */
14700           SUB     D     /* Subtract other high 8 bits */
14720           RNZ           /* Return if not equal - FLAGS set*/
14740           MOV     A,L   /* H=D so repeat for low 8-bits */
14760           SUB     E
14780           RET           /* Return with FLAGS set */

Compares values in [DE] to those in [HL] using simple subtraction. Caller must set up register pairs prior to invoking COMPAR. Caller checks flags for intended purpose after return

Carry is set if [HL] < [DE]. Zero is set if [HL] = [DE]


Interrupt 5 - FSIGN (memory location 50-57 octal)

15100   SIGN:   LDA     FAC    /* Get the exponent in to A */
15120           ORA     A      /* Get the flags */
15140           JNZ     SIGNC  /* Jump to extended handler to test sign */
15160           RET            /* Must be zero - just return */

Tests the sign of the floating point accumulator by grabbing the exponent value and setting the corresponding FLAGS. Recall that using an OR operation on the same input won't change the value, but will set flags for sign, zero, parity.


Interrupt 6 - PUSHM (memory location 60-67 octal)
15420           XHTL               /* Stack to HL -> M points at argument to PUSHM */
15440           SHLD    PUSHMA+1   /* Moves it to the end of the the next handler */
15460           POP     H          /* Restores the pointer temporarily moved to the stack */
15480   IFN     LENGTH,<           /* In the 8k+ BASIC...
15500           JMP     $CODE+59        Jumps to MOV C,M in the next handler */

Pushes the value pointed to by M / (HL) on to the stack while retaining its value in BC and advancing the HL pointer. Note that this handler actually falls in to the next handler regardless of implementation. If larger than 4k, we jump around the true handler 7 start so that a user-defined ISR can be defined.


Interrupt 7 - Not used (memory location 70-77 octal, but extends to 0o102)
15560           RET               /* Immediately exit this handler */
15600           NOP               /* No-op, but reserved byte for UDF address */
15620           NOP>              /* No-op, but reserved byte for UDF address */
15640           MOV     C,M       /* Get low 8-bits value to push in to C */
15660           INX     H         /* Move to more significant byte */
15680           MOV     B,M       /* Get high 8-bits value to push in to B */
15700           INX     H         /* Move to before both bytes */
15720           PUSH    B         /* Put it on stack */
15760   PUSHMA: JMP     PUSHMA    /* RET set up back in handler 6 */

The continuation of PUSHM, with space retained for a jump to a user-defined handler* (currently RET, NOP, NOP) in the first 3 bytes.

*The user-define handler doesn't appear to be set or self-modified within BASIC, but requires the user to POKE these addresses according to Altair BASIC manual on page 67

Data structures

Altair BASIC relies on several data structures to efficiently control program flow with minimum computation and overhead.

Most of the tables and data structures are generated at build time and loaded as-is. We'll dig in to those as well as a few uninitialized abstractions such as the Floating Point Accumulated (FAC) defined in the analgous .BSS section.

Function Dispatch Table

Functions in BASIC take one or two arguments and return a value. The function dispatch table is defined by the FUNDSP symbol and it consists of 24 sequential 16-bit function pointers. These pointers are defined in the source using the ADR() macro. The 24 BASIC functions are listed here in row-major order:

SGNINTABSUSR
FREINPPOSSQR
RNDLOGEXPCOS
SINTANATNPEEK
LENSTR$VALASC
CHR$LEFT$RIGHT$MID$

Operator Precedence Table

This very small table denoted OPTABof 7 bytes in 12k BASIC only contains the values 121, 121, 123, 123, 127, 80, and 70 representing the operators for +, -, *, /, ^, AND, and OR. In earlier versions, the pointers to there routines were also stored here but that is no longer true. The table is still used in Formula Evaluation which we will discuss later. The table is constructed with the following code, half of it 'dead' from the top macro:

16540   DEFINE ADRP(X),<ADR(X)>
16560   IFE     LENGTH-2,<
16580   DEFINE ADRP(X)<>>            /* ADRP generates no output */
16600   OPTAB: 121                   /* Byte output */
16620
16640
16660           ADRP(FADDT)
16680           121                  /* Byte output */
16700           ADRP(FSUBT)
16720           123                  /* Byte output */
16740           ADRP(FMULTT)
16760           123                  /* Byte output */
16780           ADRP(FDIVT)
16800   IFN     EXTFNC,<127          /* Byte output */
16820           ADRP(FPWRT)>
16840   IFN     LENGTH,<
16860           80                   /* Byte output */
16880           ADRP(AND)
16900           70                   /* Byte output */
16920           ADRP(OR)>
Reserved Word List

This section should contain all of the keywords and their matching crunch values - however the code listing appears to be incomplete due to this sequence of commands that disable source listing output:

17060   Q=128-1
17080   DEFINE  DCI(A),<Q=Q+1
17090           XLIST         /* Suppresses listing output */
17100           DC(A)
17110           LIST>         /* Enables listing output */

The second clue that most of this table is suppressed in the listing is the large gaps in BASIC line numbering that begins just afterward. Normally the line number increment by 20 and sometimes 10, however in the reserved word table we see:

17140           ENDTK==Q
17180           FORTK==Q
17240           DATATK==Q
17360           GOTOTK==Q
17420           IFTK==Q
17480           GOSUTK==Q
17540           REMTK==Q
17660   IFE     LENGTH-1,<
17700           ELSETK==Q
17780   IFN     DSKFUN,<DCI"DSKO$">
17800   IFN     LPTSW,<DCI"LPRINT">
17820   IFN     LENGTH,<
17880           PRINTK==Q
17940   IFE     REALIO,<
17960           DCI"DDT">

Assuming regular line number intervals of 20, each of those symbol assignments to the value Q is missing one or more lines in between. That missing line likely increments the Q value using the DCI macro declared earlier while suppressing the output. Logically, all of those Q assignments would not be equal, but we don't see the increment occurring due to the embedded XLIST in DCI.

The final clue that the listing isn't telling us the entire story is by looking at the output addresses immediately before and after this table:

   973  000171' 000000  000106          16900           70
   974                                  16920           ADRP(OR)>
   
        /* ... first half of reserved word table ... */

  1006                                  18160   ; END OF COMMAND LIST
  1007  000371' 000000  000124          18180           "T"
  1008  000372' 000000  000101          18200           "A"		

Source lines 973 outputs a byte at octal memory address 171 but the very next byte after the table writes a contant at octal address 371, imply that we've skipped 128 bytes without the assembler reporting any output between these two points.

Combining the above evidence with the statement dispatch table operator ordering, I will speculate that the suppressed lines of code look something like this (my guesses in bold):

17120           DCI"END"
17140           ENDTK==Q
17160           DCI"FOR"
17180           FORTK==Q
17200           DCI"NEXT"
17220           DCI"DATA"
17240           DATATK==Q
17260           DCI"INPUT"
17280           DCI"DIM"
17300           DCI"READ"
17320           DCI"LET"
17340           DCI"GOTO"
17360           GOTOTK==Q
17380           DCI"RUN"
17400           DCI"IF"
17420           IFTK==Q
17440           DCI"RESTORE"
17460           DCI"GOSUB"
17480           GOSUTK==Q
17500           DCI"RETURN"
17520           DCI"REM"
17540           REMTK==Q
17560           DCI"STOP"
17580           DCI"PRINT"
17600           DCI"LIST"
17620           DCI"CLEAR"
17640           DCI"NEW"
17660   IFE     LENGTH-1,<
17680           DCI"ELSE"
17700           ELSETK==Q
/* ... table continues ... */

The last byte of the table is zero (NUL)

Statement Dispatch Table

As with the Function Dispatch table, the Statement Dispatch table (STMDSP), is simply an ordered sequence of pointers to the code execution routines.

The table for the 12k build is 100 bytes long and contains the expected 50 statement dispatch pointers out of a possible 56 allowable across all build configurations. The table is organized as follows:

Index Statement RoutineIn this build?
0ENDENDYes
1FORFORYes
2NEXTNEXTYes
3DATADATAYes
4INPUTINPUTYes
5DIMDIMYes
6READREADYes
7LETLETYes
8GOTOGOTOYes
9RUNRUNYes
10IFIFYes
11RESTORERESTOREYes
12 GOSUBGOSUBYes
13 RETURNRETURNYes
14REMREMYes
15STOPSTOPYes
16ELSEELSEYes
17TRONTONYes
18TROFFTOFFYes
19EDITEDITYes
20OUTFNOUTYes
21ONONGOTOYes
22NULLNULLYes
23WAITFNWAITYes
-DSKO$*DSKO$No
-PRINTLPRINTNo
24POKEPOKEYes
25PRINTPRINTYes
26DEFDEFYes
27CONTCONTYes
-???OUT**No
28LISTLISTYes
-LISTLLISTNo
29DELETEDELETEYes
30CLEARCLEARYes
-CLOADCLOADNo
-CSAVECSAVENo
-CONSOLECONSOLENo
31NEWSCRATHYes
32CDBLFRCDBLYes
33CINTFRCINTYes
34CSNGFRCSNGYes
35+DADDYes
36-DSUBYes
37*DMULTYes
38/DDIVYes
39>,<,=DCOMPYes
40+FADDYes
41-FSUBYes
42*FMULTYes
43/FDIVYes
44>,<,=FCOMPYes
45+IADDYes
46-ISUBYes
47*IMULTYes
48/IDIVYes
49>,<,=ICOMPYes

*Unable to find specific documentation about the disk code from 1975.
**This OUT routine doesn't appear elsewhere in the code, fortunately it is not included in this build

Uninitialized static storage (.BSS)

Altair BASIC uninitialized memory space (BSS) layout

Potentially up to 185 bytes are dedicated to uninitialized variable space with 132 of those bytes backing symbols for interpreter run-time activties. Since this space cannot grow, the function of these symbols is generally limited to pointers, flags, and fixed buffers. I'll identify these symbols, their purpose, and the ideas behind them. Knowing these will speed up code reading significantly since they are often used within critical mechanisms of the interpreter. The remaining 53 bytes support the floating point accumulated discussed in the next section. Let's dig in to these key symbols by type.

PointerFixed   Points to...
MEMSIZYesThe highest location in memory. Computed by user input during initialization
TEMPPTNoThe first free temporary descriptor (one of TEMPST)
FRETOPNoThe highest unused location in 'string space' (think heap). As space is used, FRETOP decreases until it crosses STKTOP.
OLDTXTNoThe start of a text statement in progress
STKTOPUsually*The first stack entry, by default 50 bytes below MEMSIZ
TXTTABYesThe start of user text (program) data
VARTABNoThe start of simple variable space
ARYTABNoThe start of array variable space
STRENDNoThe end of variable storage
DATPTRYesThe first location after system code - the null 0 at [TXTTAB]-1. Used to 'Read' through user program DATA statement lists.

Most of these pointers are used to manage the run-time (high) memory allocations. The 'Fixed' column indicates if these pointers will change during normal program execution. Remember that in assembly, there really isn't a concept of data structure or data type. Typically, we use fixed memory addresses to point at the base of a memory region for a data structure and dynamic addresses to point at the 'work' inside of these locations. I'll discuss this idea in more detail during execution when we have to set up the run-time memory using these pointers.

*Generally fixed, but user can change the stack allocation with the CLEAR command

FlagSize (bytes)   Purpose
BUFMIN1A single character: Comma (ASCII value 44). Denotes the start of the BUF buffer.
PRTFLG1*Flags output to go to the printer
CNTWFL1Flag to suppress output via CTRL-O
DIMFLG1Determines if a pointer target is being processed with DIM.
VALTYP1Input type indicator (0 -> Number, 1-> String)
OPRTYP1**Stores the operator number. Shares the same memory byte as DORES
DORES1**Affects if crunching is performed on input. Shares the same memory byte as OPRTYP
SUBFLG1Flag used to allow subscripted variable names (in parens). Allows scanning to skip parenthese
FLGINP1Determines if the input is INPUT or DATA (0 -> Input)
TRCFLG1Debug tracing in progress (0 -> off or 1 -> on)

Most flags use a single byte to make a distinction between zero and non-zero value. In researching this code, I wonder why they weren't combined in to a single byte and bitmasked as needed, but I can only assume it was for performance reasons.

*Not used in this published build of Altair BASIC and so the space is actually 0
**OPRTYP and DORES point to the same byte

BufferSize (bytes)   Purpose
BUF72The input buffer. All user input lives here before processing
LPTPOS1*Position of printer head
TEMPST15Temp storage for multiple descriptors
DSCTMP3Single string descriptors are built in this temporary space
TEMP2Space used to store various things like NEWSTT addresses, LET values, etc
TEMP22Temporary space used by formula evaluator.
TEMP32Temp space used in garbage collection, formula eval, FOUT, etc.
DATLIN2The data line number, used for error output
CURLIN2The current line number
OLDLIN2The previous line number

Buffers are where the 'work' goes during processing. I placed the TEMP* symbols here because, unlike flags, their values are arbitrary and they are used to hold immediate values rather than branching system behavior. Several two-byte symbols hold line numbers, thus capping program lengths at 65536 lines - well within 1975 memory limitations.

It's worth noting that both BUF and TEMPST live below DSCTMP, with the latter being directly below it. This relative position determines whether or not strings are copied to string space or not. Strings provided in some constants and user program code are above DSCTMP and do not need to be moved. This quirk is due to the instruction sequence found during string processing. More on that later.

*Not used/allocated by default

Floating Point Accumulator (FAC)

The final 53 bytes of .BSS defines the floating point accumulator. The 8080 has no specialized registers to handle floating point representation, thus the obvious solution is to use static memory locations to simulate the behavior of a register by build the instructions to operate on these locations. For now, I'll focus on the storage structure and save algorithms and operations for the execution discussion. The minimum you need to know is that floating-point numbers are stored as a pair comprised of mantissa value and an exponent value which includes a sign bit.

The FAC is defined in the uninitialized variable space (impure space, as the code comments call it). The allocated space depends on the version of BASIC, which this build will focus on the 12K extended. For 12k, we essentially have two floating point storage units (FAC and ARG), used to hold and operate on two values. We also have an output buffer. Here is the space allocated and the symbols pointing to the space:

The memory layout of the Floating Accumulator in Altair BASIC


Executing Altair BASIC

We will walk through system intialization up until the moment the user is presented with 'OK' and may begin creating programs. Then we will walk through the 'main' loop of the interpter and dive in to important code branches like formula evaluation, string operations, and garbage collection.

But first, to grasp what needs to happen during initialization, we should thorougly understand what the run-time memory space should look like and how it will operate. Knowing where we are (after loading) and where we need to be will allow us to connect the dots along the path.

Run-time memory layout

After the BASIC interpreter loads, all memory above the code locations should be available for program use - this includes recovering the space used by the initialization code after it one and only pass. This is why it's located at the end of the source code so that it can easily coalesce with the unused high region. Recall that the load time memory layout for the 12K BASIC has init code at offset 14627 octal. From this location to the end of memory (octal 30000) is the space to stake out using the pointers noted from our BSS discussion.

Intuitively, the high memory area must include all of the 'user produced' content within BASIC to include program code, variables, and string constants. BASIC also needs scratch space to execute the programs (a stack). All of these components must reside in high memory and be spaced appropriately to give users sufficient capability. Consider this memory layout depicting 12K BASIC in mid-operation with data structures in use:

High memory usage in Altair BASIC

Run time memory is divided in to 5 sections, which each section delimited by the pointers identified in our .BSS disuccsion.

"String Space" is at the very top of memory and has a small allocation of 50 bytes fixed, which grows downward. This space is used for the string values read in to the input buffer. Note these are the actual string values, not the descriptors, which are stored in .BSS and point to this space.

Below that is the stack, which grows arbitrarily downward by convention of the 8080 PUSH and POP instructions.

From the bottom, we have the user program data, to include variables, which all grow collectively upward.

Pointer math

We can use simple pointer comparisons to determine the state of the program and to trigger important events or errors.

  • If FRETOP <= STKTOP, the system has run out of string space and garbage collection must be called.
  • The stack pointer [SP] must be comfortably above STREND or else we've run out of stack.
  • Total available bytes for the user at initialization is STKTOP - TXTTAB.

This model of operation has both elegance and challenge. Freeing up dead space could be as simple as just moving the pointer back up. However, reality is that sections may need to be compacted to fill in 'holes' in a segment. When the user adds a new line to the program, variable space must shift upward, requiring block transfers. These situations will be discussed later.

Knowing the desired organization during execution, we can walk backwards to the steps to perform during system initialization. In short: it's simple pointer manipulation.

The memory layout of the Floating Accumulator in Altair BASIC

Recall from the .BSS discussion that some pointers are fixed. These pointers function as the 'base' of the data segments, while the other pointers are offset from the base, denoting the endpointers of the structure (thus size). The variable pointers are interesting hybrids that function both as the base of their structure AND the endpoint of the previous. The bottom line during initialization is that the data structures are empty, thus the offset pointers should equal their assocated base pointers.

  • TXTTAB is the base of program text and indirectly to variable space. VARTAB, ARYTAB, and STREND should equal TXTTAB
  • MEMSIZ is fixed to the top of memory and FRETOP is set equal to it. After initialization, input strings will be stored here and push FRETOP down
  • STKTOP anchors the stack and lives at a fixed distance below MEMSIZ. The stack pointer [SP] will begin equal to STKTOP

System initialization

Altair BASIC is loaded and we're ready to execute starting from the instruction at address 0. But first, lets consider specific details that guide what needs to be accomplished during initialization. The most important idea is that the Altair 8800 does not have an operating system. There is no intermediate library, API, or even syscalls that will offload the details. BASIC itself must supply the I/O routines and runtime memory management. All work must be done using CPU instructions and assembler macros. Let's translate this in to specific actions:

System initialization in Altair BASIC

  • Locate I/O devices - The Altair supports several forms of I/O including teletypes, printers, tape readers, and cassettes. The code to interface with these devices is included in the BASIC routines. We must assign the proper handlers based on deteced input devices. We can poll the front panel switches and make handler assignments based on what we find. Once I/O is set up, we can prompt the user for more system details.

  • Set up the stack (temporary and permanent) - A stack is necessary to use the CALL and RET instructions provided by the 8080. The CPU implementation of SP requires the stack grow downward, suggesting that it should be located as high in memory as practical. To make use of CALL during initialization, we should set up a stack early and then transition to a permanent location after we learn more about the memory space. The final lines of initialization from 74920-75000 define the temp stack symbol TSTACK with a substantial block reservation.

  • Build the dynamic memory space - Armed with functioning I/O and a temporary stack, we can poll the user to input information about the system. We can then implement the memory model described earlier by setting the static pointers used during execution.

  • Patch up extended instructions - The user has options for which extended math functions to implement. We need to capture this configuration and patch up the references so that the function work (or fail) as intended. Note that optional functions live just before initialization so that we can remove them from execution and reclaim the space.

  • Transition to BASIC execution - Lastly, we implement a permanent stack, fix up a few code insturctions that current point to initialization, and then jump to READY, the BASIC entry point.

We pick up our code walkthrough at instruction 0.

The beginning ... jump to the end
We're already reviewed the first instruction when we looked at the first interrupt vector. But as a refresher:

13040   START:  DI            /* Disables external interrupts */
13060           JMP     INIT  /* Jumps to system init */

We disable interrupts, preventing external devices from calling BASIC functions. We jump from the bottom of memory to near the top at the beginning of the Init. Near the end of the Init, we will patch this jump to ready/main because after initialization, init will no longer exist as a valid destination.

Setting up I/O

69860           FUNIO==<^D256*^O312>+^O40+$CODE  /* Value is ^O145040 */
...
69940   INIT:                               
69960   IFN     REALIO,<                   /* If we're using a real machine (we are)*/
69980           IN      1                  /* Read and ignore garbage input */
70000           IN      ^D255              /* Read front panel switches */
70020           ANI     ^O100              /* Check for switch 14 */
70040           JZ      NOTSIO             /* Skipping down if no serial port */
70060           LXI     H,FUNIO            /* H = ^O145, L = ^O40 */
70080           SHLD    CNLCA2##+2         /* CNLCA2+3 = ^0145 (JNZ -> JZ) , CNLCA2+2 = ^O40 (5th bit asserted) */
70100           MVI     H,^O310            /* Loads opcode for RZ, L remains ^O40 */
70120           SHLD    CNLCA3##+2         /* RNZ -> RZ, ANI IDONE -> ANI ^O40*/
70140   IFN     LENGTH,<                   /* If we're running 8K+ BASIC...*/
70160           MVI     H,^O304            /* Loads opcode for CNZ */
70180           SHLD    CNLCA4##+2>        /* CZ -> CNZ, ANI IDONE -> ANI ^O40*/
70200   FUNIO==$CODE+<^D256*^O312>+2       /* Value is ^O145002 */
70220           LXI     H,FUNIO            /* H = ^O145, L = 2 */
70240           SHLD    CNLCA1##+2         /* JNZ -> JZ, ANI ODONE(128) -> ANI 2 */
70260   NOTSIO: IN      ^D255              /* Read front panel switches again*/
70280           ANI     ^O40               /* Check for switch 13 */
70300           JZ     NOTPIO              /* No parallel port? We're done here */
70320   FUNIO==$CODE+<^D256*^O312>+2       /* Value is ^O145002 */
70340           LXI     H,FUNIO            /* H = ^O145, L = 2 */
70360           SHLD    CNLCA2+2           /* JNZ -> JZ, ANI ODONE(128) -> ANI 2 */
70380           MVI     H,^O310            /* Loads opcode for RZ, L remains 2 */
70400           SHLD    CNLCA3+2           /* RNZ -> RZ, ANI IDONE -> ANI 2*/
70420   IFN     LENGTH,<                   /* If we're running 8K+ BASIC...*/
70440           MVI     H,^O304            /* Loads opcode for CNZ, L remains 2 */
70460           SHLD    CNLCA4+2>          /* CZ -> CNZ, ANI 2 */
70480   FUNIO==$CODE+<^D256*^O312>+1       /* Value is ^O145001 */
70500           LXI     H,FUNIO            /* H = ^O145, L = 1 */
70520           SHLD    CNLCA1+2           /* JNZ -> JZ, ANI ODONE(128) -> ANI 1 */
70540   NOTPIO:>

The first thing we care about in Init is getting the user I/O working because 1) We need to ask the user setup questions, 2) Programming BASIC necessarily needs I/O. What's actually happening is that we're determine what kind of I/O cards we're using and patching the previously loaded default I/O routines with small changes based on the cards, ports, and detection bits. Determining how and why these work requires very close attention to the Altair manuals regarding their cards and port selection. Another thing to note is that SHLD orders the bytes low-high, meaning the value in H goes to the memory address+1.

For the SHLD lines pointing at symbols CNLCA*, these are located in BASIC statement code and define I/O channel operations. This self-modifying code ensure that these routines match the protocol and address of the detected hardware.

Setting up the stack (and a few extras)

70560           LXI     H,$CODE+^D65535    /* HL = -1 (direct mode) */
70580           SHLD    CURLIN##           /* Set the current line static var to -1 */
70600           LXI     H,TSTACK           /* Temp stack pointer in to HL */
70620           SPHL                       /* Actual stack pointer set to temp stack address */
70640           SHLD    STKTOP##           /* Set static var tracking the first stack entry location */
70660   IFN     CONTRW,<                   /* If CTRL-O (output suppression) is supported */
70680           XRA     A                  /* Clear Accumulator (XOR self) */
70700           STA     CNTWFL##>          /* Output suppression flag is 0 - force output */
70720           CALL    CRDO               /* Output a carriage return */
70740   IFN     STRING,<                   /* If strings are enabled */
70760           LXI     H,TEMPST##         /* Get the first temp string descriptor location */
70780           SHLD    TEMPPT##>          /* Set the next string descriptor pointer to this first location */

Setting up a few of the internal tracking variables and setting up the stack. Note that TSTACK maps to the very end of loaded code space, which is not the end of actual physical memory. We need to use some token stack space for CALLs during init. We'll revisit setting up the full stack after we learn about the total system memory.

Discovering memory space

This section has a significant amount of work, so I will break up the sections with header comments

/*** Get memory info from the user ***/
70800   IFN     REALIO,<
70820           LXI     H,MEMORY           /* Point HL to string 'MEMORY SIZE' */
70840           CALL    STROUT             /* Print the string */
70860           CALL    QINLIN             /* Print '? ' and wait for user input */
70880           CHRGET                     /* Get user input (This is an RST handler) */
70900           CPI     "A"                /* Check input type (See above discussion) */
70920           JZ      INITAT             /* Output authors and restart if not number???
                                              * These last two lines seem like a test or bug
                                              * Published versions didn't have this behavior */
70940           ORA     A                  /* Set flags based on A (zero or not) */
70960           JNZ     USEDE9             /* User said something, skip below next routine */

/*** User didn't enter memory value - discover space automatically ***/
70980           LXI     H,LASTWR           /* Get last program code line in to HL */
71000   LOOPMM: INX     H                  /* Increment HL to the next memory location */
71020           MVI     A,311              /* Store ^O311 in to A */
71040           MOV     M,A                /* Store ^O311 from in in to memory at HL */
71060           CMP     M                  /* Check that memory for value in A (Zero = match) */
71080           JNZ     USEDEF             /* Doesn't match -- beyond usable memory, jump down */
71100           DCR     A                  /* Matched, reduce A to ^O310 */
71120           MOV     M,A                /* Store new value in same location */
71140           CMP     M                  /* Check match again */
71160           JZ      LOOPMM             /* Matched, Mem ok - Jump up and try next location */
71180           JMP     USEDEF             /* No match - jump down to finalize memory size */

/*** Process user-provided memory space ***/
71200   USEDE9: LXI     H,BUF              /* Point buffer location in to HL */
71220           CALL    LINGET             /* Put the memory value in to DE */
71240           ORA     A                  /* Check for a zero (terminated line) */
71260           JNZ     SNERR              /* No zero-term? Syntax error */
71280           XCHG                       /* High memory addr moves from DE to HL */
71300           DCX     H                  /* Move back to last usable byte */

/*** Wrap up memory search ***/
71320   USEDEF: DCX     H>                 /* Walk memory back to last usable byte */
71340   IFE     REALIO,<                   /* If we're emulating...
71360           LXI     H,$CODE+16190>      *    Pick a specific memory size */
71380           PUSH    H                  /* Save top memory location on stack for later */

The flow here is to ask the user directly for memory amount and save it in HL. If the user doesn't enter, BASIC will calculate the amount by reading and writing to increasingly higher memory addresses until it fails to read. If the user provided the memory size, we should do a sanity check. In both cases, we walk back at least a byte and anchor that point by pushing it on the stack. Later this will be moved to the MEMSIZ static pointer.

Note that the line numbers on 70900 and 70920 don't seem to make sense. They would seem to force an init loop with valid input. My assumption is that there were added in for testing and the listing was printed this way. These instruction do not appear in known disassemblies I've reseached.

Discovering terminal width

/*** Get the terminal width from user ***/
71400   TTYW:   LXI     H,TTYWID           /* Point HL to the string "TERMINAL WIDTH" */
71420           CALL    STDOUT             /* Print the string */
71440           CALL    QINLIN             /* Print '?' and wait for user input */
71460           CHRGET                     /* Get user input */
71480           ORA     A                  /* Verify state of input (zero or not) */ 
71500           JZ      DFLENT             /* No input, assume defaults from Common File */
71520           LXI     H,BUF              /* Entry occurred, point to buffer */
71540           CALL    LINGET             /* Get the line value in to DE */
71560           MOV     A,D                /* Move high order byte to A */
71580           ORA     A                  /* Check if it's non-zero */
71600           JNZ     TTYW               /* Non-zero? Waaay too big, redo this */
71620           MOV     A,E                /* Get the lower order byte */
71640           CPI     16                 /* Compare it to the value 16 */
71660           JC      TTYW               /* Less than 16? Too small. redo this */
71680           STA     LINPT1##           /* Self-modifies "LINLEN" argument in file F3, line 2601 */
71700   IFN     LENGTH,<                   /* If we're 8K BASIC or higher... */
71720           STA     LINPT2##           /* Self-modifies "LINLEN" argument in file F3, line 3091 */
71740   IFN     STRING,<                   /* If strings are enabled... */
71760           STA     LINPT3##>          /* Self-modifies "LINLEN" argument in file F3, line 3825 */
71780   MORCPS: SUI     14                 /* Subtract 14 from A... */
71800           JNC     MORCPS             /* ...Repeatedly until below zero. Output range: (-1 to -14)*/
71820           ADI     28                 /* Add 28 to A. Output range: (14 to 27)*/
71940           CMA                        /* Complement A. Output range: (-15 to -28) */
71960           INR     A                  /* Add 1, leading to negated A. (-14 to -27) */
71980           ADD     E                  /* Add back original - Result is 2nd order lower multiple of 14 */
71900           STA     LINPT4##           /* Self-modifies "NUTPOS" argument in file F3, line 3914 */

This is fairly straightforward - Ask user for terminal width size and propogate it. Default to 72 if nothing is provided. If something is provided, the LINLENs need to be patched up in the associated statements that use console output, such as PRINT.

Fixing up static constants (.BSS)

71920   DFLENT:
71940   IFN     STRING,<                        /* If strings are enabled... */
71960           LXI     D,$CODE+^D65536-^D50+1  /* Loads -49 in to DE */
71980           POP     H                       /* Grabs the previously saved top of memory */
72000           SHLD    MEMSIZ##                /* Make it official by storing in MEMSIZ */
72020           SHLD    FRETOP##                /* Initialize FRETOP here also (empty string space)*/
72040           DAD     D                       /* Subtract 49 from top for the end of string space */
72060           JNC     OMERR                   /* If this is negative -- error out for lack of memory */
72080           DCX     H                       /* Reduce 1 to 50 bytes from the top - now at future stack top */
72100           PUSH    H>                      /* Save the future stack top location to the temp stack */

If strings are enabled, we have to set up the string space - this means setting up the relevant pointers: MEMSIZ and FRETOP. While you might think that we'd need MEMSIZ even without string functions, it turns out that this symbol is only used here during init and within the string functions.

Set up extended functions
Note that I'm cutting a few of the 4K-only lines here because the same task happens in the 8K+

/*** Ask user for desired functions ***/
72800   IFN     EXTFNC,<
72820   ASKAGN: LXI     H,FNS          /* Points to WANT SIN-COS-TAN-ATN string */
72840           CALL    STROUT         /* Prints the string */
72860           CALL    QINLIN         /* Awaits user input */
72880           CHRGET                 /* Gets user input */
72900           CPI     "Y"            /* Compares input to 'Y' */
72920           LXI     D,INITSA       /* Store address to Init (as end of code ref) */
72940           JZ      HAVFNS         /* User wants functions, skip down */
72960           CPI     "A"            /* User didn't say 'Y', check 'A' (delete ATN only)*/
72980           JZ      OKCHAR         /* Jump down if we're only deleting ATN */
73000           CPI     "N"            /* Not Y or A, so check N */
73020           JNZ     ASKAGN         /* No answer recognized - ask again */
73040   OKCHAR: LXI     H,ILLFUN       /* Point to 'Illegal Function Call' string */
73060           LXI     D,ATN          /* Point to the ATN function */
73080           SHLD    ATNFIX##       /* Load 'Illegal Function Call' to ATN */
73100           CPI     "A"            /* Check again if 'A' brought us here */
73120           JZ      HAVFNS         /* If so, we're done, otherwise continue */
73140           SHLD    COSFIX##       /* Load 'Illegal Function Call' to COS */
73160           SHLD    TANFIX##       /* Load 'Illegal Function Call' to TAN */
73180           SHLD    SINFIX##       /* Load 'Illegal Function Call' to SIN */
73200           LXI     D,COS>         /* Store address of COS (for end of code ref) */

This is a simple segment of user Q&A and patching of discarded function pointers. Two subtle lines to note from this code block are 72920, 73060, 73200. The first stores the address of INITSA in DE. The second overwrites DE with the address of ATN if we decide to remove that function. And the final overwrites DE with the address of COS if we remove all math functions.

DE denotes the address after the last usable code. This will be used to start the dynamic memory area beginning with a zero delimiter followed by user program code in TXTTAB

Set up dynamic memory area

73220   HAVFNS:
73240           XCHG                  /* Pull DE in to HL (the end of code / bottom of memory) */
73260           MVI     M,0           /* Zero-term this to mark the end of BASIC */
73280           INX     H             /* Increment HL to next byte */
73300           SHLD    TXTTAB##      /* Store this location as the base of TXTTAB */
73320           XTHL                  /* HL <-> SP. Get real stack top pushed earlier on line 72100*/
                                       *   This also puts TXTTAB on temp stack
73340           LXI     D,TSTACK      /* Get top of temp stack (should be end of INIT) */
73360           COMPAR                /* Verify that these values make sense */
                                       *   Temp stack should be below the real stack */
73380           JC      OMERR         /* If somehow HL (real stack) is below DE (temp stack), error out */
73400           POP     D             /* No error - bring TXTTAB in to DE */
73420           SPHL                  /* HL ->; SP. Make it the official stack now */
73440           SHLD    STKTOP        /* Store stack top in static memory (.BSS) */
73460           XCHG                  /* HL <-> DE. TXTTAB in HL and STKTOP in DE */
73480           CALL    REASON        /* Verify good spacing between TXTTAB and STKTOP */
73500           MOV     A,E           /* Get the bottom 8 bits of STKTOP in accumulator */
73520           SUB     L             /* Subtract away the bottom 8 bits of TXTTAB */
73540           MOV     L,A           /* This leaves the amount of lower 8 bits of empty space in L */
73560           MOV     A,D           /* Repeat for upper 8 bits */
73580           SBB     H             /* High-order subtraction */
73600           MOV     H,A           /* HL now contains the free/unused space between stack and heap */
73620           LXI     B,$CODE+65520 /* Get -16 in to BC. (space for stack to grow) */
74640           DAD     B             /* Subtract the 16 from the space in HL. */
74660           CALL    CRDO          /* Print a line feed */
73680           CALL    LINPRT        /* Print the numberic amount of free bytes computed in HL */
73700           LXI     H,WORDS       /* Point at string 'BYTES FREE' */
73720           CALL    STROUT        /* Print the string */
73740           LXI     H,STROUT      /* Get memory location STROUT in to HL */
73760           SHLD    REPINI##+1    /* Store this in routine in READY (replacing INIT) */
73780           CALL    SCRTCH        /* Effectively 'NEW', setting up TXTTAB, VARTAB, ARYTAB */

The last code block set up the top of dynamic memory and this block sets up the bottom, beginning with the null-term and the base of TXTTAB where user code will live. We then set up the permanent stack and verify sufficient spacing between it and the user code area.

Wrapping up initialization

73800   IFN     LPTSW,<                   /* If there is a printer output */
73820           MVI     A,4               /* Prep accumulator with number 4 */
73840           OUT     2                 /* Clear the buffer - send 4 to printer */
73860           XRA     A                 /* Clear A */
73880           STA     PRTFLG##          /* Reset printer flag */
73900           STA     LPTPOS##>         /* Reset printer line position */
73920   IFE     CONSSW,<LXI     H,READY>  /* No console? Load READY routine in HL */
73940   IFN     CONSSW,<                  /* If console (nope ... dead code) */
73960           LXI     H,CONSDO##>       /* Load CONSDO in to HL */
73980           SHLD    $CODE+2           /* Set the 2nd byte of memory to HL value.
                                           *   This overwrites instruction 0's JUMP from INIT to READY */
74000           PCHL                      /* Set program counter to READY. Time to run BASIC! */

The main loop

After initialization, Altair BASIC spends most of its time repeating the same 100 instructions in a tight loop. Most of the 'work' is boilerplate reading, verifing, and moving user input. It branches off for specific input handlers, updating memory regions, and executing direct requests. As suggested in the high-level discussion earlier, there are paths through MAIN which depends if the user is making Direct or Indirect statments. Here is our map:

The main loop of Altair BASIC

We'll begin by walking through the Indirect loop, bypassing the crunch routine (for now), and circling back to MAIN. Since no program execution occurs during indirect input, we do not have to redo the READY routine.

/*** READY is a brief preporatory routine for MAIN ***/
28700   READY:  IFN     LPTSW,<         /* If there is a printer...*/
28720           CALL    FINLPT>         /* Print any remaining input */
28740   IFN     CONTRW,<                /* If output is suppressed... */
28760           XRA     A               /* Get a zero */
28780           STA     CNTWFL>         /* Force output by clearing suppress flag */
28800           LXI     H,$CODE+65535   /* Get a negative -1 */
28820           SHLD    CURLIN          /* Make -1 the current line number, forcing direct-mode */
28840           LXI     H,REDDY         /* Get routine to print "OK" */
28860   REPINI: CALL    INIT            /* Print 'OK' ...  This INIT was rewritten as... */
28880                                   /*   CALL STROUT at the end of INIT (Line 73760 above) */

In READY, we clear all input, set up for direct mode, and print 'OK' for the user. The inital startup enters through here as well as any long deviations from the main loop to ensure that we're set up for continued operation. Small branches that return to MAIN will enter below.

/*** The intro to big MAIN - reading user input ***/
28920   MAIN:   CALL    INLIN           /* Get input line. MAIN loops within INLIN until input */
28940           CHRGET                  /* Input happened - get the first character */
28960           INR     A               /* Increment then decrement A as a ... */
28980           DCR     A               /*   clever hack to check for Zero w/o changing Carry */
29000           JZ      MAIN            /* Zero? Line was blank, go back for more input */
29020           PUSH    PSW             /* Not zero - save the flags, including preserved carry */
29040           CALL    LINGET          /* Get the line number */
29060           PUSH    D               /* Put it on the stack above the flags */
29080           CALL    CRUNCH          /* Reduce the line to its basic components */
                                        /*   DETAILS LATER ON CRUNCH ROUTINE */
                                        /*   CRUNCH returns A=0, BC = character count */
29100           MOV     B,A             /* Make B=0, ensuring BC is a positive count */
29120           POP     D               /* Get the line number back (smashed by CRUNCH) */
29140           POP     PSW             /* Get flags/carry back (smashed by CRUNCH) */
29160           JNC     GONE            /* No carry = direct statement - jump to direct handler */
                                        /*   Carry was preserved from CHRGET on line 28940 */

We've retrieved some input and we've decided if this is a direct statement to be executed immediately (jump to GONE), or an indirect statement and should add it to the user program (fall through below). The CRUNCH call in this block hasn't been discussed, but its purpose is to reduce the line to single-byte tokens that can offset in to the statement dispatch table.

/*** Indirect statement handling - Find and delete old line ***/
29180           PUSH    D         /* Indirect, save line number on the stack */
29200           PUSH    B         /* Also save the character count used in NODEL routine */
29220           CHRGET            /* Remember the state/type of the line */
29240           PUSH    PSW       /* Save its flags (type) on the stack */
29260           CALL    FNDLIN    /* Check user program for existing line and get a pointer */
29280           PUSH    B         /* Push the pointer to the line on the stack */
29300   IFE     LENGTH-2,<        /* In 4K basic... */
29320           CC      DEL>      /* Just delete the line without checks */
29340   IFN     LENGTH-2,<        /* In 8K+ BASIC... */
29360           JNC     NODEL     /* No carry - no match. Nothing to delete, done with block */
29380           XCHG              /* Time to delete. Move line pointer from HL to DE */
                                  /*  DE points to the NEXT line (The one AFTER the search) */
                                  /*  Must loop from found line to end of TXTTAB to compact it */
29420           LHLD    VARTAB    /* Get Variable Table in HL. TXTTAB ends at VARTAB */
29440   MLOOP:  LDAX    D         /* Load NEXT line byte in to A */
29460           STAX    B         /* Stores the NEXT line byte in THIS line */
                                  /*   We must now patch every line afterward */
29480           INX     B         /* Move to next destination byte in TXTTAB */
29500           INX     D         /* Move to the next source byte in TXTTAB */
29520           COMPAR            /* Check if we're still in within TXTTAB */
29540           JNC     MLOOP     /* No Carry means we are still in TXTTAB (more work) */
29560           MOV     H,B       /* Get the high byte of the end of TXTTAB in H */
29580           MOV     L,C       /* Get the low byte of the end of TXTTAB in L */
29600           INX     H         /* Increment HL to find the next VARTAB */
29620           SHLD    VARTAB>   /* Store the new, lower positioned VARTAB */

We know the input is an indirect statement and so the TXTTAB (User program table) must be manipulated. This launches in to low-level mechanical operations that require us to follow closely with the table organization. Visualizing the TXTTAB manipulation is easier if you understand how a single line is composed. Consider this:

The format of TXTTAB in Altair BASIC 3.0

Using the line number, we call FNDLIN to get pointers to both the target line and the next line. If there is no target, we can skip to the next block. Otherwise, we have to delete that line by rolling all the subsequent lines up one byte at a time until we reach the end of the TXTTAB (which is VARTAB). The tight loop labelled MLOOP does the work of moving bytes, moving pointers, and checking bounds. We end by setting the new start of VARTAB. Note that actual VARTAB data is untouched and there will be some dead space at the beginning, which is fixed up in the next block.

/*** Indirect statement handling - Patching up VARTAB and inserting new line ***/
29640   NODEL:  POP     D         /* Retrieve the line insert position pushed on line 29280 */
29660           POP     PSW       /* Retrieve line type pushed on line 29240 */
29700           JZ      FINI      /* No work to do if this line is empty */
29720           LHLD    VARTAB    /* Get the end of TXTTAB in HL */
29740           XTHL              /* Swap HL (VARTAB) with top of stack. */
29760                             /*   HL now contains input length pushed back on line 29200 */
29780           POP     B         /* Get VARTAB pointer just swapped to stack */
29800           DAD     B         /* Add line length to VARTAB, accomodating new line */
29820           PUSH    H         /* This will be the new VARTAB, but save on stack first */
29840           CALL    BLTU      /* Block shift all data between insert point and new VARTAB */
29860           POP     H         /* Regain new VARTAB pointer smashed by BLTU */
29880           SHLD    VARTAB    /* Make this the new VARTAB */
29900           XCHG              /* DE is now new VARTAB, HL is line insert point */
29920           MOV     M,H       /* Store this pointer in its own memory location */
29940           INX     H         /*    This avoids zero pointers halting CHEAD fixup */
30000           INX     H         /* 2nd increment now points us to line number storage */
30020           POP     D         /* Get the line number from stack (pushed at 29180) */
30040           MOV     M,E       /* Save low order byte of line number in pointed at HL */
30060           INX     H         /* Increment to high order byte */
30080           MOV     M,D       /* Save high order byte of line number in TXTTAB */
30100           INX     H         /* Done with pointers and line numbers. Increment to line data */
30140           LXI     D,BUF     /* Point to the base of the line buffer in .BSS */
30160   MLOOPR: LDAX    D         /* Load a byte from the line buffer in to A */
30200           MOV     M,A       /* Store A in to TXTTAB location pointed by HL */
30220           INX     H         /* Point to next character destination */
30240           INX     D         /* Point to next buffer source position */
30260           ORA     A         /* Test for zero in A (implies we're done) */
30280           JNZ     MLOOPR    /* Repeat the last 5 lines until finished */

Previously, we either deleted a line and rolled up the table, or we found no line to delete. Now we're ready to insert the new line. The way to do that is to use the insert point discovered earlier and block copy (BLTU call) all bytes afterward even further forward by the size of the new line. We then fill in the empty space with the line components. The MLOOPR loop copies the arbitrarily sized line content bytes, which includes the final NUL on the line.

The big problem we're left with is broken pointers. After shifting many lines around at the byte level, we've never actually updated the pointers to ensure that they are correct.

/*** Fixing up pointers to all recently moved TXTTAB lines ***/
30300   FINI:   CALL    RUNC      /* Clears up dynamic memory and stack */
30340           INX     H         /* HL now points to TXTTAB base (RUNC side-effect) */
30480   CHEAD:  MOV     D,H       /* Setting DE to HL (high byte) */
30500           MOV     E,L       /* Setting DE to HL (low byte) */
30520           MOV     A,M       /* Get line pointer for text (high byte) */
30540           INX     H         /* Point to low byte */
30560           ORA     M         /* Verify if bytes hold any non-zero value */
30580           JZ      MAIN      /* If zero, we are completely done - back to MAIN */
30600           INX     H         /* Point to first byte of line num */
30620           INX     H         /* Point to second byte of line num */
30640           INX     H         /* Point to first byte of line data */
30660           XRA     A         /* Clear accumulator */
30680   CZLOOP: CMP     M         /* Check data byte for NUL (end of line) */
30700           INX     H         /* Move to next byte - If was NUL, now points to next line ptr */
30720           JNZ     CZLOOP    /* If not zero, repeat until the end */
30740           XCHG              /* Swap HL/DE. DE is now start of next line, HL is last line ptr */
30760           MOV     M,E       /* Set first byte of TXTTAB pointer to next entry base */
30780           INX     H         /* Increment to second pointer byte */
30800           MOV     M,D       /* Set the 2nd pointer byte */
30820           XCHG              /* HL picks back up at next line. */
30840           JMP     CHEAD     /* Repeat until double-zero triggers jump to main (line 30580) */

Keep an eye on the HL and DE pointers, which alternate positions with the XCHG command as they walk through the lines anchoring the pointer and referencing the next pointer. CZLOOP walks through the line content to end pointing at the next line to backfill the previous line's pointer. CHEAD walks through every line and takes us back to MAIN.

You may note that dynamic memory is reset each and every time a new line is added. This is a quick way to ensure valid pointers and that HL is pointed to the base of user code. There is no user programming running during the input phase, so nothing is lost.

Crunch time

I handwaved line 29080 above (CALL CRUNCH), but this is an critical routine that saves space by doing an in-place transformation of user input in the buffer. Not only does it reduce keywords to a single byte, but that byte also serves as the offset in to the statement dispatch table for quick execution. So again, our two goals are 1) Compression and 2) Fast dispatch.

Goal 1 (compression) is accomplished by encoding all keywords by their Q value we determined earlier. 7th bit asserted (Decimal 128+ and Octal 200+).
Goal 2 (dispatch) is done by ordering the dispatch table the same as the encoded value. That we can calculate an offset and jump using this single byte.

The CRUNCH routine solves this problem in about 100 lines - the same length as MAIN. Naturally this makes for some dense code that requires us to have a tight grasp on the inputs, outputs, and goals. Let me review them now:

Input: The input buffer BUF contains a line as typed by the user. HL points to the first character after the line number.

Output: The input buffer contains user input with keywords replaced by single bytes. Line number is removed. Accumulator is 0, C holds the length of the input (not linked to B).

So we're looking at an input-parsing exercise in assembly. These can be tedious to read, especially going in blind. To ease the burden, let's talk about what must happen in order to get from input to output:

  • Read in characters one at a time and decide what to do? Skip it? Change it? Pass it through?
  • Keywords must be changed (crunched)
  • We must count each character that is output
  • Be aware of delimiters like colon, question mark, quotes, etc
  • User input strings wrapped in quotes should be passed through unchanged
  • DATA statements should not be altered, but they may look like any data type
  • REM statements can be ignored
  • Crunched keywords match to the reserved word table
  • We must track the matching offset into the reserved keyword table
  • Lines must be properly terminated

Given the above requirements, we can speculate on the minimum set of tools we'll need to do this work:

  • A source pointer to read the buffer - this will be HL register pair
  • A destination pointer to mark the next output location - the DE register pair
  • A flag to suspend crunching based on detected delimiters - this is the DORES flag discussed in .BSS
  • A counter to capture the offset as we try to match reserved word - This is done by register B

Let's dive in to the crunch code:

33700   CRUNCH: IFN     STRING,<        /* If strings are enabled */
33720           XRA     A               /* Clear accumulator */
33740           STA     DORES>          /* Flag 0 - enable crunching */
33760           MVI     C,5             /* Set character count to 5 - minimum size of TXTTAB line */
33780           LXI     D,BUF           /* DE output beings at buffer base */

This short setup routine lays the ground word discussed before. We clear the accumulator and set the crunch flag to proceed with the work. We begin the line length counter at 5 since that is the minimum size of a TXTTAB line descriptor (2 byte pointer, 2 byte line number, and 1 byte line length). We end by pointing DE at the base of the buffer where we will write the crunched line

33800   KLOOP:  MOV     A,M             /* HL is further along in the buffer, getting next character in A */
33820           CPI     " "             /* Check for a space */
33840           JZ      STUFFH          /* Push space to output, increment I/O ptrs and count */
33860           MOV     B,A             /* Not space - move characer to B */
33900           CPI     34              /* Check if character is a quote */
33920           JZ      STRNG           /* If quote, jump to user string handler */
33940           ORA     A               /* Check for NUL (end of line) */
33960           JZ      CRDONE          /* End? We're done crunching */
33980   IFN     STRING,<                /* If strings are enabled... */
34000           LDA     DORES           /* Gets the crunching status flag (for DATA statements) */
34020           ORA     A               /* Sets flags to check value */
34040           MOV     B,A             /* Init B to zero if we're about to crunch */
34060           MOV     A,M             /* Retrieve buffer character again */
34080           JNZ     STUFFH>         /* No crunch? Just pass character to output */
34120   IFN     LENGTH,<                /* If 8k+ BASIC... */
34140           CPI     "?"             /* Check character for question mark */
34160           MVI     A,PRINTK        /* Replace character with print token */
34180           JZ      STUFFH          /* Send print token to output */
34200           MOV     A,M             /* Grab the character a third time */
34220           CPI     "0"             /* Check if control characters */
34240           JC      MUSTCR          /* This is a crunchable reserved word */
34260           CPI     60              /* Check for Colon (above numerics) */
34280           JC      STUFFH          /* Must be a number, just pass it through */

This is the top-level loop for the crunch routine. We enter at the top and we will leave with 3 possible outcomes: 1) We're completely finished crunching (CRDONE), 2) We pass the byte to the output without crunching (STUFFH), or 3) we will definitely crunch this byte (MUSTCR). You may notice a 4th exit called STRNG, but this leads to the STUFFH byte passthrough by skipping repeated type checks during the quoted string.

We make decisions based on the value of the incoming character ordered by the ASCII table, the logic looks like this.

  • Space? Pass it through unchanged
  • Is there a quote? Jump to special string handler that passes everything through until matching quote
  • Is there a NUL? We're completely done the line and crunching
  • If crunching is disabled (DATA statments), then pass it through
  • If it's a question mark, this is an alias for PRINT, substitute a print token and pass it through
  • If the character is less than numerics (control characters). We must crunch.
  • If the character is less than a '<' it is a numberic or colon. Pass it through
  • Anything else must be a letter, which will fall through below to crunching
34300   MUSTCR: >                       /* This is the crunch! */
34320           PUSH    D               /* Save pointer to output start */
34340   IFE     STRING,<                /* If strings are disabled... */
34460           MVI     B,0>            /* Clear B */
34480           LXI     D,RESLST-1      /* Point DE to reserved word list-1 */
34400           PUSH    H               /* Save current input character pointer */
34420           XWD     ^O1000,^O76     /* Skip next instruction on first entry */
34440   NXTRES: CHRGET                  /* Get new character (entry here from 34620)*/
34460           INX     D               /* Walk to next resereved word */
34480   RESER:  LDAX    D               /* Get the reserved word in to A */
34500           ANI     127             /* Mask away all but the lower 7 bytes (no sign) */
34520           JZ      TABEND          /* At the end of the list? Done - jump out */
34540           CMP     M               /* Compare current charcater with  */
34560           JNZ     NTHIS           /* Not a match for this reserved word */
34580           LDAX    D               /* Match so re-get the reserved word byte */
34600           ORA     A               /* Set the flags for value */
34620           JP      NXTRES          /* Positive - no match. Neg/Sign - match */

First, we need space to store another pointer, so we push the DE to the stack since we won't be outputting anything directly in this routine. We then load the pointer to the reserved keyword list to DE. We then save the character input pointer to the stack which could be smashed.

Skipping some preable instructions, we load the first reserved word and mask away the 8th bit which we know to be 1 for all reserved words. Zero? Then the table is finished, otherwise we compare each letter. If there is no match, we jump back up and try the next reserved word until we find a match and fall down to the next block.

34640   FOUND:  POP     PSW             /* Trash input pointer saved on stack */
34660           MOV     A,B             /* B holds reserved word counter, move to A */
34680           ORI     128             /* And set the 8th/sign bit to mark reserved word */
34700           XWD     ^O1000,^O362    /* Skip next two instructions, previous ORI guarantees no jump */
34720   TABEND: POP     H               /* Recover the input character pointer */
34740           MOV     A,M             /* Get the input character again */
34760           POP     D               /* Recover the output pointer */
34780   IFE     LENGTH-2,<              /* If we're in 12k+, we need to handle ELSE tokens */
34800           XCHG                    /* HL is now the output location */
34820           CPI     ELSETK          /* Check if this is an ELSE token */
34860           MVI     M,":"           /* Terminate the ELSE with a colon */
34880           CZ      INXHRT##        /* Advance pointer so : isn't smashed */
34920           XCHG>                   /* HL is back to input and DE is output */

At the top, we've found the reserved word, with it's offset stored in B. The offset IS the crunched value after we assert the 8th bit to mark it as a reserved word. If we entered this code block at TABEND, then we did not find the word and so we need to get the pointers back. In either case we fall down in to the the STUFFH pass-through to write either the newly crunched byte or the original byte.

An edge case at the bottom of this routine appends a colon to an ELSE for the 12k+ BASIC.

34940   STUFFH: INX     H               /* Just passing the byte - increment to next byte */
34960           STAX    D               /* Store byte held in A to the destination */
34980           INX     D               /* Increment to next destination */
35000           INR     C               /* Add 1 to the line length */
35020   IFN     STRING,<                /* If strings are disabled... */
35040           SUI     ":"             /* Check for colon by subtraction */
35060           JZ      COLIS           /* Match - allow crunching again */
35080           CPI     DATATK-":"      /* Check for data token (modified by prev SUI) */
35100           JNZ     NODATT          /* No data token, skip to REM check */
35120   COLIS:  STA     DORES           /* Enable crunching  */
35140   NODATT: SUI     REMTK-":">      /* Check (and store) for REM token */
35160   IFE     STRING,<SUI     REMRK>  /* If strings are enabled, see if REM */
35180           JNZ     KLOOP           /* No rem? back to the very top - next char */
35200           MOV     B,A             /* Get more of the REM statement */

This block starts with the accumulator prepped with the desired output (either a crunched value or a pass-through value from the top-level loop). Feed A to the output, incrementing the line length, source and destination pointers accordingly.

We also check for the edge cases of DATA and REM tokens before jumping all the way back to the top to proceed with more crunching

35220   STR1:   MOV     A,M             /* Get the input character again */
35240           ORA     A               /* Is it a NUL? */
35260           JZ      CRDONE          /* Then we're done, otherwise */
35280           CMP     B               /* Check for matching quote */
35300           JZ      STUFFH          /* Store it */
35320   STRNG:  INX     H               /* Increment to next string byte */
35340           STAX    D               /* Store it in output immediately */
35360           INR     C               /* Increment the output count */
35380           INX     D               /* Increment the desintation pointer */
35400           JMP     STR1            /* Go to the next character */

We usually enter this block from the STRNG label except for the REM token edge case. So let's start in the middle which is where we pick up having already seen a quote in the input buffer. We essentially skip to the next character, store it, and repeat until we encounter the end of string NUL or another quote (already saved in B). This block and the next two end with an unconditional exits, making these isolated routines.

35440   NTHIS:  POP     H               /* Not a reserved word match, get input pointer back */
35460           PUSH    H               /* Save it again */
35480           INR     B               /* Increment counter to next reserved word */
35500           XCHG                    /* Reserved word list in HL, Input pointer in DE */
35520   NTHIS1: ORA     M               /* Compare reserved word to A */
35540           INX     H               /* Move to next byte */
35560           JP      NTHIS1          /* Repeat until match */
35580           XCHG                    /* Match - Return reserved word list to DE. Input to HL */
35620           JMP     RESER           /* Finished scan, move to next reserved word */

This is the reserved word iteration routine that scans until we find a match and jump back to the top.

35640   CRDONE: LXI     H,BUFMIN        /* Crunching complete, move HL pointer back to start of line */
35660           STAX    D               /* NUL terminate the line once (end-of-line) */
35680           INX     D               /* Move to next byte */
35700           STAX    D               /* NUL terminate the line twice (zero link value) */
35720           INX     D               /* Move to next byte */
35740           STAX    D               /* NUL terminate a third time for end of program */
35760           RET                     /* Crunch complete! Return to MAIN */

Crunching is complete so null terminate the line as needed. RET back to the main loop to line 29080 after CALL CRUNCH

Formula Evaluation

Formula evaluation is the heart of most (all?) code generation programs. Altair BASIC gives us a glimpse of what evaluation looked like in the mid-1970s. Perhaps surprisingly, the modern convention of recursive calls to eval() until reaching a base case remains firmly established even 50 years ago.

Unfortunately, this is such a behemoth of a routine that I won't go through it line by line. The routine is over 1000 lines long not including the significant reliance on the math package. Instead, I will break it down to the subroutine level with a few key line operations thrown where they add the most value. I encourage the very curious to dig through the entire formula eval routine found in file F3 between listing lines 4470 and 5495. On to the discussion.

The symbol FRMEVL is called within many statement handling routines during direct command interpretation. Here are the ten specific cases when Altair BASIC evaluates formulas:

  • Integer coersion with the INT statement
  • Within the LET statement for evaluating the value of the variable
  • Twice in an IF statement for both the predicate and the consequent
  • When PRINT is called for the subsquent expression
  • Type checking in FOR statements
  • Within EVAL as part of a recursive evaluation of a complex expression
  • Evaluating the arguments of a user-defined function
  • Iterating over strings in the 12k+ BASIC version
  • Writing to disk with DSKO$
  • Writing to memory with POKE

When FRMEVL is called from any of the above paths, these conditions follow:

State at call: HL points to the first character of the formula to evaluate.
State at final return: HL points at the formula terminator. The Floating Point Accumulator (FAC) holds the result.
Helpers: The operator table (OPTAB) provides precedence values and dispatch addresses. Intermediate results for evaluation reside on the stack. Temporary variables in .BSS hold transient pointers within the expression (TEMP2, TEMP3).

FRMEVL evaluates complex expressions through one or more calls to EVAL for each lexeme. EVAL determines the type, dispatches to specific handlers, and stores work in temporary variables, the stack, or and the FAC. EVAL may invoke FRMEVL recursively if it encounters a potential complex expression within parenthesis. Consider this:

Formula evaluation in Altair BASIC

FRMEVL begins by saving the current code pointer in HL then priming the stack with a precedence of 0. Then it verifies sufficient stack space exists for execution. The call to EVAL, determines the type of the next operation, stores the precedence on the stack along with potential values in the FAC. Using the prior precedence with the newly discovered precedence, FRMEVL determines whether to execute the last operation, or store it again along with the current temporary results, repeating the process again. Eventually we will run out of tokens to arrive at a final answer or we'll run out of stack space.

To get a better picture of the previous two sentences, let's take a look at the labels and routines built within FRMEVL. From a flow control perspective, each of the jump labels will form the start of a basic block, however only a small subset of the labels can be considered full routines in a logical sense. The high-degree of optimization and code reuse without explicit function calls results in multiple entry points and multiple exit points across many major routine in FRMEVL. Here is with the full list with all labels and bolding to set apart the major routines (RETALOP, APPLOP, and EVAL):

Routine Entry From Entry MethodPurpose
FRMCHKFRMEVL, FORFall-through (FT), CallFRMEVL less backup of the input location pointer
LPOPERFRMCHK, DOPREC, DOMIN, NOTERFT, Jump, CallSave precedence and verify stack
RETAOPLPOPR, DOPREC, DODSPFT, RET (trampoline), PCHLEntry to FRMEVL without the setup preamble. Directly handle the next expression
TSTOPRETAOP, CATFT, RETRoutine to test precedence of the next operation
NOTSTVTSTOPFTGet next character and prep relation bits
LOPRELNOTSTVFTLoops search all relational operators to match and set mask bits
ENDRELLOPRELJumpTail of FRMEVL to determine what to do with the current and last operators discovered
DOPRECENDREL, FINRELFT, JumpStores precedence and loops back to LPOPER
NUMRELFINRELJumpPushes the FAC on to the stack
VPUSHDNUMRELFTMoves the contents of E and A to BC and pushes to the stack
FINTMPVPUSHD, ANDORD, FINRELFT, JumpPushes next return address and restarts FRMEVL at LPOPER
EXPSTKDOPRECJumpConverts FAC value to single precision in prep for exponentiation
ANDORDDOPRECJumpConverts FAC value to integer in prep for AND and OR operations
FINRELENDRELJumpPuts special relational operators on stack for return and execution
APPLOPENDRELRET trampolineApplies operations previously pushed on to the stack
STKDBLAPPLOPJumpConverts FAC to double precision to hold stack operands
SNGDBLSTKDBL, FACDBLFT, JumpMoves 4 bytes from stack to FAC
SETDBLSNGDBL, FACDBLFT, JumpPreps dispatch for double conversion
DODSPSETDBL, SNGDOFT, JumpFinal prep for double dispatch and actual PCHL jump
FACDBLAPPLOPJumpPreps the FAC convertion to double from int or single
STKSNGAPPLOPJumpPreps the FAC convertion to single precision
SNGDOSTKSNG, FACSNGFT, JumpSets up the dispatch to single conversion
FACSNGAPPLOPJumpPreps the FAC convertion to single precision when stack is an integer
EVALLPOPER, CATCallDetermines the operation type and populates FAC
PARCHKOKNORM, Call, FTChecks for parenthesis for complex expressions
DOMINEVALFT, JumpStore registers and negate value in the FAC
LABBCKDOMIN, NOTFRFFT, RETRestore text pointer and return to main FRMEVL routine
ISVAREVALJumpHandles variables with the FAC
ISFUNEVALJumpHandles functions to be evaluated
OKNORMISFUNJumpParses single argument functions and RETs to them for execution
NOTFRFOKNORMFT, JumpSkips FAC conversion process
FINGONOTFRFFT, JumpDispatches to function routine
GETYPEPRINT, FINREL, othersCallGets the type and sets condition codes
ORN/AN/ALogical op for 8K only, possibly dead code replaced by DANDOR
ANDN/AN/ALogical op for 8K only, possibly dead code replaced by DANDOR
ANDCONN/AN/ALogical op for 8K only, possibly dead code replaced by DANDOR
ORFINDANDOR, ANDCON (dead?)Jump.
DORELPTDORL, FINRELRETSets up a relational operation to be performed
STRCMPFINTMP, RETAOPRETCompares two strings
CSLOOPSTRCMPFT, JumpInner loop for comparing strings
DOCMPRETAOPRETComparison between two values and resulting bit flags
NOTEREVALJumpPerform NOT operation
DANDORRETAOPRET Performs AND or OR depending on precedent
ORFINDANDORJumpFinalizes registers and PCHLs back to last location

That is the long list of all routines that make up the Formula Evaluation in order of definition - excluding calls to the math package. The overall feature I'll point out is that the top-level functions are general and broad dealing mostly with determining where to flow next. The routines at the bottom are specific worker functions that do the final processing to produce the output or desired side-effects.

Let's take a line-by-line look at EVAL, which focuses specifically on typing the current input and dispatching to the low-level worker routine.

06820   EVAL:
06840   IFN     LENGTH-2,<          /* If 4 or 8k... */
06860   IFN     STRING,<            /* ..and strings active...(so 8k only!) */
06880           XRA     A           /* Clear A */
06900           STA     VALTYP>>    /* Get the type in A */
06820           CHRGET              /* Read the next character to evaluate */
06840           JC      FIN         /* Number? Handle floating input */
06960           CALL    ISLET       /* Is this is a letter? */
06980           JNC     ISVAR       /* It IS a letter - assume variable */
07000           CPI     PLUSTK      /* Not letter, check for plus sign */
07020           JZ      EVAL        /* Plus sign, skip by reentering EVAL */
07040           CPI     "."         /* Check for decimal point */
07080           JZ      FIN         /* Decimal point? Handle floating input */
07100           CPI     MINUTK      /* Minus sign? */
07120           JZ      DOMIN       /* Negate number value */
07140   IFN     STRING,<            /* If strings are active */
07160           CPI     34          /* Check for quote (string delimiter) */
07180           JZ      STRLTI>     /* It is a quote - handle a string */
07240   IFN     LENGTH,<            /* If we're in 8k+... */
07260           CPI     NOTTK       /* Check for a NOT token */
07280           JZ      NOTER>      /* Handle the NOT */
07300   IFN     FUNCTS,<            /* If functions are active... */
07320           CPI     FNTK        /* Check for user function */
07340           JZ      FNDOER>     /* Execute user function */
07360           SUI     ONEFUN      /* Compare with function name list */
07380           JNC     ISFUN       /* Great - must be a function */
07460   PARCHK: SYNCHK  "("         /* Check for open paren */
07500           CALL    FRMEVL      /* Assume contents are complex function - recurse! */
07520           SYNCHK  ")"         /* Check for close paren */
07540           RET                 /* Done! */

Within EVAL, we're at the lowest tier that reads a token and determines the specific operator applications. The job is to determine which operation to apply. The ruleset is:

  • Unquoted letter? Must be a variable - get the pointer and value
  • Number or decimal point? Call floating point input routine
  • Unary operation (NOT and negate)? Push precedence on the stack and re-enter Formula eval
  • Quote mark? Handle as string literal
  • Function token? Handle as a function
  • Skip certain characters like '+'
  • Parenthesis? Recurse in to FRMEVL

FAC operations

We've discussed the layout of the Floating Point Accumulator during loading and described it as a set of static variables in a specific arrangement. We will now review the routines that orchestrate operations among these memory locations. We'll start with the calling conventions established for FAC operations:

  • The result of FAC operations will be in the FAC
  • In single-argument operations, the argument is in the FAC
  • In two-argument operations:
    • The first argument is in B,C,D,E
    • The second argument is in the FAC
  • Many operations have two entry points ("S" and "T") each using alternative convensions:
    • "S" entries begin with the first argument referenced by HL and the MOVRM operation brings it in to the registers.
    • "T" entries begin with the first argument on the stack beginning with low order bits and with the sign pushed last.
    • The stack is set up with two PUSHM operations. The POPR gets the arguments from the stack in to the registers.
    • All entries to "T" should be with a jump to avoid call pushing the return address push over the stack argument.

For quick reference, I'll re-post the FAC structure here:

The memory layout of the Floating Accumulator in Altair BASIC

Now we'll look at the list of FAC operations before walking through one of them. I encourage walkthrough through all of these functions line by line using the symbols indentified above as a reference. Note that many routines have multiple entry points with differing state assumptions.

Routine Entry(s) Purpose
..
FADDH, FADDS, FADDT, FADDFloating point addition
FSUBS, FSUBT, FSUBFloating point subtraction
NORMALNormalization by left shifting until MSB is 1
ZEROZeros the FAC
ROUNDRounds values in C,D,E,B and transfers to FAC
FMULTS, FMULTI, FMULTFloating point multiplication
FDIVT, FDIVFloating point division
FLOAT, FLOATRCoerce value to float
ABSAbsolute value
NEG, VNEGNegate value in FAC
SGNBasic SGN function - Get the sign of a value
PUSHFPush FAC to stack
MOVFMMove value pointed at by HL in to FAC
MOVFRMove register set B,C,D,E to FAC
MOVRFMove FAC to B,C,D,E
MOVRMMove (B,C,D,E) to HL (not really a FAC OP)
MOVMFMove FAC to HL
VPUSHFFrom FAC to Stack
VMOVFA, VMOVFMFrom HL to FAC
VMOVAF, VMOVMFFrom FAC to HL
FRCINTConvert FAC to Integer
CONISConvert Single to Integer
FRCSNGConvert FAC to Single
CONSDConvert Double to Single
FRCDBLConvert FAC to Double
CONDSConvert Single to Double
QINT(Quick) Greatest Integer Function
INTFNCGreatest Integer Function
DINTDouble greatest integer function
DSUBDouble subtraction
DADDDouble addition
DNORMLDouble normalize FAC
DNEGRDouable negate FAC
DMULTDouble multiuplication
FINFloating point input to FAC
FOUTFloating point FAC to output

Pay close attention to the naming convention and the associated interface with the FAC. For instance:

  • MOVxy routines imply data movement to x from y where (M = memory, F = FAC, R = Registers).
  • V* routines consider the value type as determined by VALTYP
  • D* routines are all double precision operations
  • FRC* and CON* convert value types

Note that I did not include complex math functions like EXP, POLYX, SQR in this list because they focus on executing well-known math operations rather than interfacing with the FAC.

Now we'll take a look at two FAC-related routines. The first demonstrates simple movement and how many of these functions simply call each other for space efficiency:

/* MOVFM gets data in to the FAC from a memory location (HL)
17580   MOVFM:  CALL    MOVRM   /* Move number from memory to B,C,D,E */
17700   MOVFR:  XCHG            /* Swap DE to HL */
17720           SHLD    FACLO   /* Push HL to the low order FAC bytes */
17740           MOV     H,B     /* Move the highest byte in to H */
17760           MOV     L,C     /* Move the 2nd highest byte in to L */
17780           SHLD    FAC-1   /* Put the high bytes where they beyond */
18000           XCHG            /* Restore original DE */
18020           RET             /* Done! */

Seemingly simple, but the trick with this MOVFM routine is that we're not going directly from memory to the FAC, but copying it to registers first using an existing routine - saving memory by reusing a code path. MOVFM piggybacks on MOVFR by falling through by using a quick call to MOVRM. Fall-through takes care of the rest.

Now for a core FAC operation - reading in floating numbers with FIN. Although this code page is over 400 lines, much of it is duplicated for both the 12K and the smaller versions. I will cut out the 'dead' code for anythingb but the 12k version. Recall from earlier that the F3 file was built for 12K+, but the F4 file where FIN lives targets 8k.

  • Setup: HL points to first input character and already in A
  • Result: The number is in the FAC
  • C tracks if we've read in the decimal (0 = yes, 377o = no)
  • B counts digits after decimal
  • E is the exponent
  • All registers smashed

The floating point input routine in Altair BASIC

Consider this diagram while walking through FIN. Comments are my own however this section of source is very well commented already and my changes are minor.

/* Check the leading sign of the value - entry from VAL may include the sign */
39040   FIN:
39060   IFN     STRING,<
39100           CPI     "-"     /* Check for leading minus */
39120           PUSH    PSW     /* Save the FLAGs */
39140           JZ      FIN1    /* A minus - skip to next code block */
39160           CPI     "+"     /* Check for leading plus */
39180           JZ      FIN1>   /* A plus - skip to next code block */
39200           DCX     H       /* Not a leading sign, move back for re-read */

Simple code block - store info on stack. It there was sign info skip the line that walks back the text pointer (need to reread for upcoming processing)

/* Register set up routine */
39220   FIN1:
40700   IFE     LENGTH-2,<               /* 12K+ routine */
40720           XCHG                    /* Moving the text pointer to DE */
40740           LXI     B,377+$CODE     /* Setting up decimal point flag to 377o (not seen yet) */
40780           MOV     H,C             /* Clear H */
40800           MOV     L,C             /* Clear L */
40820           CALL    CONISS          /* Clear FAC and set expected value to integer */
40840           XCHG                    /* Get the text pointer back */

Setting up the base register values for upcoming usage

/* Checking the TYPE of the next character */
40880   FINC:   CHRGET             /* Get next character */
40900           JC      FINDIG     /* Carry implies digit - jump to digit handler */
40920           CPI     "."        /* Check for decimal point */
40940           JZ      FINDP      /* Match - go to decial point handler */
40960           CPI     "E"        /* Check for letter E (single precision marker) */
40980           JZ      FINEX      /* Jump to single precision handler */
40000           CPI     "D"        /* Check for letter D (double precision handler) */
41020           JNZ     FINE       /* NOT D? That's everything, we're done. Jump to cleanup */
41040           ORA     A          /* Must be double - set FLAGS (zero flag) */

Type checking routine with the caveat that doubles will fall down to single precision handler with zero flag reset

/* Handle single or double precision numbers */
41060   FINEX:  CALL    FINFRC     /* Conversion routine. Zero flag set = single */
41080           PUSH    H          /* Save current input position */
41100           LXI     H,FINEC    /* Load address of exponent character fetch in to HL */
41120           XTHL               /* HL <-> stack - Now reading current input. Next routine on top of stack */

/* Check exponent sign and skip if necessary */
41160           CHRGET             /* Load next character in to A */
41180           DCR     D          /* Set exponent sign to negative */
41200           CPI     MINUTK     /* Check for negative exponent token in A */
41220           RZ                 /* Negative exponent confirmed - RET to next character fetch */
41240           CPI     "-"        /* Check for negative char in A (from VAL entry)*/
41260           RZ                 /* Negative exponent confirmed - RET to next character fetch */
41280           INR     D          /* Still here? Must be positive, put exponent sign back to zero */
41300           CPI     PLUSTK     /* Leading plus token? */
41320           RZ                 /* RET to next character */
41340           CPI     "+"        /* Leading plus from VAL entry? */
41360           RZ                 /* Ditto RET */
41380           DCX     H          /* No sign? Probably digit - move input pointer back */
41420           POP     PSW        /* Pop and discard FINEC RET trampoline - not needed now */

These routines convert the FAC to the necessary and test the next character for a sign.

/* Fetching the next character of exponent */
41460   FINEC:  CHRGET             /* Getting the next characters in to A */
41480           JC      FINEDG     /* Carry = Digit. Put digit in exponent */
41500           INR     D          /* Not digit - put sign to positive or zero (if neg) */
41520           JNZ     FINE       /* Positive - we're done */
41540           XRA     A          /* Must be negative so... */
41560           SUB     E          /* Subtract exponent value */
41580           MOV     E,A        /* Resave it to E as negative */

Retrieves the next character of the exponent and adjusts the target register as necessary

/* Done with exponent search - clean up and continue */
41620   FINE:   LDA     VALTYP     /* Get type in A */
41640           CPI     2          /* Compare with integer (type 2) */
41660           JNZ     FINEF      /* Not integer? Jump to finish float */

/* Negative integer finisher */
41700           POP     PSW        /* Get the sign value */
41720           XCHG               /* Store input pointer. HL -> DE */
41740           CZ      INEG       /* Negative? Negate value */
41760           XCHG               /* Get input pointer back */
41780           RET                /* Finished */

/* Floating point finisher - multiply for full number*/
41820   FINEF:  PUSH    H          /* Store input pointer on stack */
41840           MOV     A,E        /* Get final exponent in to A... */
41860           SUB     B          /* Less the digits beyond decimal point */

/* A now holds the number of times to multiple or divide by 10 */ 
41900   FINEF2: CP      FINMUL     /* A still positive? Time to multiply */
41920           CM      FINDIV     /* A negative? Time to divide */
41940           JNZ     FINEF2     /* Still here? Try this all again */

/* All shifting done - lets verify signs */
41980           POP     D          /* Restore input point to DE (stored from line 41820 above) */
42000           POP     PSW        /* Get the sign (stored from line 39120) */
42020           CZ      NEG        /* Zero? Must be negative. Negate FAC */
42040           XCHG               /* Get input position from DE to HL */

/* Catch edge case of negating -32768 (which has no positive complement in S16) */
42060           LDA     VALTYP     /* Get the current value type in the FAC */
42080           CPI     4          /* Check for single precision (-32768 INT looks like SNG) */
42100           RNZ                /* Not SNG, so not our edge case - done! */
42120           PUSH    H          /* It is single - save the input pointer we're about to smash */
42140           LXI     H,POPHRT   /* Find POPHRT routine for CONIS routine prep */
42160           PUSH    H          /* Store it on the stack (CONIS2 could RET to it) */
42180           CALL    CONIS2     /* Routine to check for -32768 */
42200           RET                /* Still here? No match - so just RET to POPHRT */

This series of routines wrap up input by verifying types, matching signs, and exeuting multiplication or division necessary to normalize the value in the FAC.

This ends the top-level FIN process. However, I will continue with a few of the standalone routines referenced above.

/* Find the decimal point and set flags */
42300   FINDP:  INR     C        /* Set flag for decimal point seen (377o + 1 = 0 [flag seen]) */
42320           JNZ     FINEF    /* Not zero? We're here twice - end it */
42340           CALL    FINFRC   /* This is the first decimal point - convert FAC to single/double */
42360           JMP     FINC     /* Done - on to the next digit */
42380
/* Converts FAC to single or double. ZF=1 ? single : double */
42460   FINFRC: PUSH    H          /* Store input pointer */
42480           PUSH    D          /* Store exponent tracker (DE) */
42500           PUSH    B          /* Store decimal point info (BC) */
42520           PUSH    PSW        /* Save FAC status info */
42540           CZ      FRCSNG     /* Zero? (ZF=1) - convert to single */
42560           POP     PSW        /* Restore FAC status info */
42580           CNZ     FRCDBL     /* Not Zero? (ZF=0) - convert to double */
42600           POP     B          /* Restore decimal point info (BC) */
42620           POP     D          /* Restore exponent tracker (DE) */
42640           POP     H          /* Restore input pointer */
42660           RET>               /* Finished */

/* Multiply FAC by ten one time */
42740   FINMUL: RZ                      /* No exponent? No work to do */
42760   FINMLT: PUSH    PSW             /* Save exponent (FOUT enters here) */
42820   IFE     LENGTH-2,<              /* 12k version only */
42840           LDA     VALTYP          /* Get the type */
42960           CPI     4               /* Check if we have a single */
42980           PUSH    PSW             /* Save type info from smash */
42900           CZ      MUL10           /* Single - multiply by 10 */
42920           POP     PSW             /* Get type back */
42940           CNZ     DMUL10>         /* Double - multiple by 1000 */
43960           POP     PSW             /* Get exponent info */
43980   DCRART: DCR     A               /* Reduce by 1 */
43000           RET                     /* Done with single x10 operation */

This completes roughly half of the FAC code and where I'll stop to save space. The chosen code blocks should provide good background for those wishing to dig further.

String operations

I briefly discussed "string space" as the highest region of memory that grows downward to a default of 50 bytes. The MEMSIZ pointer is the highest valid memory address, which is also the base of string space. The FRETOP pointer starts at MEMSIZ and moves downward to point to the next free byte in string space. Keep this idea in mind when we discuss garbage collection in the next section - for now, we'll think about how strings get in to string space.

Strings can enter the system through three pathways:

  • Input as a direct statement - String briefly lives in the buffer. A descriptor is created and the string is copied to string space
  • Included within a program - String descriptor references program directly and no string space is used
  • String generated at runtime - String result will be created in string space

With these interactions in mind, let's break the routines down in to two flavors: "Command" routines that directly implement BASIC commands (STR$, LEN, etc), and "Support" routines that interface between string space, descriptors, and other runtime components. We can think of the "Command" routines as standalone operations. Let's take a closer look at the "Support" routine interactions:

A snapshot of string operation routines in Altair BASIC

This diagram shows all the "support" functions except STROUT, which doesn't interact with other memory components. All others parse and move data among memory and registers. The top-level string operations include:

SymbolType Purpose
STR$CommandBASIC command to convert input numeric to a string type
STRCPYSupportCopies a string in to string space and create a descriptor
STRLITSupportCreates a string descriptor in DSCTMP
PUTNEWSupportMoves a new descriptor to one of the temporary positions
STROUTSupportPrint the string pointed to in HL
GETSPASupportAllocates string space for a new string
CATSupportConcatenates two strings with the + operator
FRETMPSupportFrees a temporary string descriptor
LENCommandBASIC command to return the character length of an input string
ASCCommandBASIC command to return the ASCII value of the input character
CHR$CommandBASIC command to return the character representing the input value
LEFT$CommandBASIC command to return the left-most characters specified
RIGHT$CommandBASIC command to return the right-most characters specified
MID$CommandBASIC command to return characters within a string
VALCommandBASIC command to convert input string to a number, if possible

Let's dig in to two of the support functions and two of the command functions.

The STRCPY function begins with HL pointing to a string descriptor create from the input buffer. The result is a pointer to DSCTMP containing the new decsriptor. The only time STRCPY is called is within STRLIT. Note that the first byte of a the 3-byte descriptor is the length of the string.

29200   STRCPY: MOV     A,M        /* Move first byte of descriptor (length) in to A */
29220           INX     H          /* Move pointer up (to low byte of string address */
29240   IFN     LENGTH-2,<         /* But in 8K... */
29260           INX     H>         /* Skip one more byte since 8K descriptors are 4-bytes */
29280           PUSH    H          /* Push address on to stack */
29300           CALL    GETSPA     /* Allocate string space */
29320           POP     H          /* Get the address to the new space to HL*/
29340           PUSHM              /* Save address on stack and advance HL pointer */
29360           POP     B          /* Recover address in to BC (smashing useless value) */
29380           CALL    STRAD2     /* Prepare DSCTMP */
29400           PUSH    H          /* Save pointer to DSCTMP */
29420           MOV     L,A        /* Get length in to low address (offset from H) */
29440           CALL    MOVSTR     /* Move the string */
29460           POP     D          /* Restore DSCTMP pointer */
29480           RET                /* Done! */

The quirk here is that descriptors may be 4 bytes if we not in the 8K version, so we must skip an extra byte after the length.

You'll notice that copying strings relies on the GETSPA routine to allocate string space. We'll dig in to that function now, which will tie in to our next section about garbage collection. Some things to know about the :

  • Register A holds the number of bytes requested
  • Register pair DE holds the pointer to the free string space
31440   GETSPA: ORA     A               /* Set flags for provided value */
31460           XWD     ^O1000,^O016    /* Skips the next POP instruction */
31480   TRYGI2: POP     PSW             /* Alternate entry where A is on the stack */
31500           PUSH    PSW             /* Save space requested on the stack */
31520           LHLD    STKTOP          /* Find the absolute end of string space */
31540           XCHG                    /* Store it in D/E for now */
31560           LHLD    FRETOP          /* Get the current actual end of free space */
31580           CMA                     /* Negate A (the number of bytes required) */
31600           MOV     C,A             /* Move the negative bytes required to C */
31620           MVI     B,255           /* Extend complement to B (BC = negative size)*/
31640           DAD     B               /* Add (the negative) size to FRETOP in HL */
31660           INX     H               /* Add back one byte */
31680           COMPAR                  /* Compare proposed string end with top of stack */
31700           JC      GARBAG          /* Carry? Oops, no space. Attempt garbage collection */
31720           SHLD    FRETOP          /* Space is available - Bump FRETOP down to new end */
31740           INX     H               /* Move up one more byte above FRETOP */
31760           XCHG                    /* FRETOP + 1 -> DE, STKTOP -> HL */
31780   PPSWRT: POP     PSW             /* Restore the space request (char count) */
31800           RET                     /* Done! */

This is a straight-forward rourtine that checks the space request, compares the potential new string with the stack top to detect collision, then garbage collects if necessary and returns the new pointer. Note the mechanical method of creating the negative memory offset with the complement (CMA) and sign extending the larger register of the pair with a 255.

Garbage collection

If you're coming from a modern language with garbage collection, the Altair BASIC implementation will seem simple and narrow. There are no nascent forms of mechanics like Reference Counting or Mark-and-Sweep. Instead, Altair BASIC garbage collection focuses solely on the 50 bytes of string space and attempts to find space when the stack may be smashed. Let's revisit our memory model alongside the GETSPA routine just discussed above:

String space growth in Altair BASIC

As shown in the previous section, GETSPA may call garbage collection if the string space size requested would cause FRETOP to collide with STKTOP. We could simply throw an out of memory error and disappoint the user; however, Altair BASIC first attempts to recover string space by analyzing existing string descriptors to find and remove unnecessary allocations. Only if space recovery fails, should we throw an error.

Now for a closer look at the longest path through this process:

Garbage collection in Altair BASIC

We begin in GARBAGby verifying that we haven't already run garbage collection, if so then we've failed to collect and error out immediately since the result is idempotent. If we go ahead with collection, then we first completely reset string space in preparation for building it back from scratch. We're not scanning the space for unnecessary strings in, but rather we're using existing descriptors to rebuild from scratch.

We then loop through all of the variable locations and push string candidates on to the stack. Ultimately, we unwind the stack by putting the living strings back in to string space.

Let's walk through the garbage collection code:

/* Initiate garbage collection if this is the first attempt */
31840   GARBAG: POP     PSW             /* Get flags to test if this is a re-entry */
32860           MVI     E,ERRSO         /* Prepare error handler in E */
32880           JZ      ERROR           /* Zero means we've already failed to collect - error */
32900           CMP     A               /* Not zero? Make it zero now */
32920           PUSH    PSW             /* Put updated flags back on the stack */
32940           LXI     B,TRYGI2        /* Get the return point after garbage collection */
32960           PUSH    B               /* ...and put it on the stack */
32980   GARBA2: LHLD    MEMSIZ          /* Get the top of memory in to HL */
32000   IFE     REALIO,<                /* If this is a real terminal */
32020           MVI     A,7             /* Get the BEL character */
32040           OUTCHR>                 /* Ring it to announce garbage collection */

We only want to enter garbage collection once, so we first check if this is a reentry. If not, then we flag it as entered immediately. We find the top of memory and signal the user about garbage collection with the terminal bell

/* Get the next temp and jump to stack push if valid */
32060   FNDVAR: SHLD    FRETOP          /* Nuke string space - MEMSIZ -> FRETOP */
32080           LXI     H,$CODE         /* Get a zero */
32100           PUSH    H               /* Push on stack to ensure garbage entry (ZF = 0) */
32120           LHLD    STKTOP          /* Get start of stack to mark off non-string space */
32160           PUSH    H               /* Put on stack (now 0 - STKTOP) */
32180           LXI     H,TEMPST        /* Get first temp string */
32200   TVAR:   XCHG                    /* Move it to DE */
32220           LHLD    TEMPPT          /* Get next temp string */
32240           XCHG                    /* Swap with DE (now back in HL) */
32260           COMPAR                  /* Compare TEMPPT and TEMPST */
32280           LXI     B,TVAR          /* Get temp var garbage collector */
32300           JNZ     DVAR2           /* Pointers different? We have temp vars to collect */

On first pass, we reset string space (or shift if entering from elsewhere). From here we prepare to read through string temporaries first. Note that this basic block will jump down to the stack storing section, skipping the next two sections.

/* Scan simple variable space (VARTAB) */
32340   SVARS:  LHLD    VARTAB          /* Get the base of simple variable table */
32360   SVAR:   XCHG                    /* Put it in DE */
32380           LHLD    ARYTAB          /* Get the end of the simple variable table */
32400           XCHG                    /* Swap with base of table */
32420           COMPAR                  /* Check if we're at the end */
32440           JZ      ARYVAR          /* Yes? Move on to Array variables */
32460           MOV     A,M             /* First memory location holds the 2nd char of name */
32480           INX     H               /* Move up to the first char of the variable name */
32500           INX     H               /* Move up to the byte length */
32520   IFE     LENGTH-2,<              /* In the 12k+ version... */
32540           INX     H               /* Move to the variable type (3 = string) */
32560           CPI     3               /* Test it for a string */
32580           JNZ     SKPVAR          /* Not a string? skip it */
32600           CALL    DVARS           /* It's a string - collect it */
32620           XRA     A               /* Sets Zero Flag to 1 to stop skips */
32640   SKPVAR: MOV     E,A             /* Move A to E - Will be zero if fall-in */
32660           MVI     D,0             /* Clears D ensuring that DE is the skip amount */
32680           DAD     D>              /* Move HL forward by DE bytes (the skip!) */
32760           JMP     SVAR            /* Jump back up to skip more if necessary */

This section checks if simple variables are 'alive'. If so, skip the next basic block to the stack storing routine.

/* Scan array variable space (ARYVAR) */
32800   ARYVA2: POP     B               /* Remove unneeded pointer for skipped var */
32820   ARYVAR: XCHG                    /* Put ARYVAR base from HL in to DE */
32840           LHLD    STREND          /* Get end of ARYVAR */
32860           XCHG                    /* Swap base (to HL) with end (to DE) */
32880           COMPAR                  /* See if they are the same (means no work to do) */
32900           JZ      GRBPAS          /* Collecting done - jump to unwind phase */
32920   IFE     LENGTH-2,<              /* If this is the 12K+ Altair BASIC */
32940           MOV     A,M             /* Get the variable type in to A */
32960           INX     H>              /* Move to the next byte */
32980           CALL    MOVRM           /* Get the length in to BC */
33040           PUSH    H               /* Put pointer to values on the stack */
33060           DAD     B               /* Move forward to the next variable */
33080   IFE     LENGTH-2,<              /* If this is the 12K+ Altair BASIC */
33100           CPI     3               /* Check type for string */
33120           JNZ     ARYVA2>         /* Not string? Nothing to clean. Next! */
33200           SHLD    TEMP3           /* Store the end of the array because we have to ... */
33220           POP     H               /* Jump back to the beginning of this one to collect */
33240           MOV     C,M             /* Get the number of dimensions (length) */
33260           MVI     B,0             /* Won't need high byte */
33280           DAD     B               /* Move the length beyond the base */
33300           DAD     B               /* Again - for the second byte */
33320           INX     H               /* One beyond the end */
33340   ARYSTR: XCHG                    /* Current HL position to DE */
33360           LHLD    TEMP3           /* Bring the end of the array back to HL */
33380           XCHG                    /* Swap DE and HL */
33400           COMPAR                  /* Compare both of these values */
33420           JZ      ARYVAR          /* Same? Move to next array */
33440           LXI     B,ARYSTR        /* Not same - get routine jump point and fall down to push*/

This section checks the array variables. All strings are pushed to the stack and afterward, we jump to the final unwinding routine.

/* Push the current variable on to the stack for later */
33460   DVAR2:  PUSH    B               /* Save return point on to the stack */
33480   IFE     LENGTH-2,<              /* If this is the 12K+ Altair BASIC */
33500   DVAR:                           /* Alternate label */
33520   DVARS:  XRA     A               /* Clear accumuatlor, ZF = 1 */
33540           ORA     H               /* Check if we're pointed at a null */
33560           INX     H               /* Move to next byte (low address) */
33580           MOV     E,M             /* Move to low address to E */
33600           INX     H               /* Bump to high address */
33620           MOV     D,M             /* Move to D, DE now points to string value*/
33640           INX     H>              /* Move to next descriptor */
33840           RZ                      /* Was this null? Return */
33860           MOV     B,H             /* Not null? Move to BC (high byte) */
33880           MOV     C,L             /* Not null? Move to BC (low byte) */
33900           LHLD    FRETOP          /* Get the pointer to the end of string space */
33920           COMPAR                  /* Check string location relative to end of string space */
33940           MOV     H,B             /* Recover HL (high byte) */
33960           MOV     L,C             /* Recover HL (low byte) */
33980           RC                      /* Carry? String isn't in string space, RET (ignore) */
34000           POP     H               /* Get return address from push on 33460 */
34020           XTHL                    /* Return address to stack and highest address to HL */
34040           COMPAR                  /* Compare both (and note carry flag for later) */
34060           XTHL                    /* Swap address back again (Ret address in HL) */
34080           PUSH    H               /* And back on the stack */
34100           MOV     H,B             /* Get back our present scanning location */
34120           MOV     L,C             /* from lines 33860-33880 */
34140           RNC                     /* No carry? Found a candidate to move */
34160           POP     B               /* Get return address from 3 lines back */
34180           POP     PSW             /* Pop off the highest variable seen */
34200           POP     PSW             /* Pop off the pointer */
34220           PUSH    H               /* Save the base of the string on the stack */
34240           PUSH    D               /* Save the high end of the string on the stack */
34260           PUSH    B               /* Save the return address on to the stack */
34280           RET                     /* Return! */

This routine stores living string pointers on the stack, which will unwind in to string space in the next section

/* Unwind our variables on the stack in to string space */
34360   GRBPAS: POP     D               /* Get the end of the string */
34380           POP     H               /* Get the beginning of the string */
34400           MOV     A,L             /* Get the low byte in to A */
34420           ORA     H               /* Check for zero (null) pointer */
34440           RZ                      /* Zero means end of work. Return */
34480           DCX     H               /* Move back from the string to the descriptor */
34500           MOV     B,M             /* Move high byte of descriptor in to B */
34520           DCX     H               /* Move back to the first byte of the descriptor */
34540           MOV     C,M             /* Low byte to C - BC = string pointer */
34550           PUSH    H               /* Save pointer to update later */
34560           DCX     H               /* Move back further to string length */
34620           MOV     L,M             /* Get length in to H */
34640           MVI     H,0             /* Length has no high byte. HL = Length */
34660           DAD     B               /* Add length to address - HL = beyond end of string */
34680           MOV     D,B             /* High string addess -> D */
34700           MOV     E,C             /* Low -> E. DE = Original Pointer */
34720           DCX     H               /* Base + Length - 1. Last byte of string */
34740           MOV     B,H             /* End of string in to BC */
34760           MOV     C,L             /* DE and BC are the ends of the string */
34780           LHLD    FRETOP          /* Get the current end of string space in HL */
34800           CALL    BLTUC           /* Block transfer string to new location */
34820           POP     H               /* Get point location saved on 34550 */
34840           MOV     M,C             /* Store new low byte address result from block transfer */
34860           INX     H               /* Move to high address */
34880           MOV     M,B             /* Save new high address */
34900           MOV     L,C             /* Get low byte in to HL */
35920           MOV     H,B             /* Get high byte in HL. HL = new address */
35940           DCX     H               /* Address - 1 to save to FRETOP at below jump */
35960           JMP     FNDVAR          /* Move to next variable */

This section unwinds all our work staging string boundaries on the stack and block transfers the data to string space. Pointers are fixed up as after move.

Other sections worth further study

I reviewed every line of code during this write up and there were several sections I wanted to explore further but I decided to put off for another time (maybe). Here is a list of things worth further study:

  • The individual BASIC command implementations
  • The math package
  • Error Handling
  • Technical details of I/O between BASIC and the Altair hardware

Where's the brilliance?

Books have been written about the business case of being first on the block to bring programming in to the home. However, I will keep to the technical aspects of the Altair BASIC source code.

Allen, Davidoff, and Gates made a big program fit in a small space through a combination of these strategies:

  • Routine re-use with multiple entry-points - Core routines are specified once but employ multiple header labels and preambles depending on origin path. This resembles tail merging and jump threadng used in later optimizing compilers
  • Optimized fall-through - Avoid branching checks by ensure that the most commonly run code paths are adjacent.
  • Functional identities - Build one 'base' routine and reuse it for similar cases. This is used extensively in the math package, such as programming only SINE then specifying COS, TAN in terms of SIN.
  • Platform-specific manipulation - Taking advantage of the Altair and 8080 architecture to save space, such as using interrupt handlers for common functions

Routine re-use with multiple entry-points

This is essentially function-reuse in assembly with the complication of having multiple label names as entry points depending on the state of the registers at jump/call time. Simple compilers would just emit a separate routine for each case. The handcrafted version here essentially "factors out" the common tail reused by attaching an all-purpose preamble that 'catches' different states which fall through to the actual work. Consider the first few lines of STRLIT, which turns a string in to a literal:

/* STRLIT builds a descriptor for a string pointer to by HL+1 */
29720   STRLIT: DCX     H           /* Decrement HL so that it string is HL+1 */
29740   STRLTI: MVI     B,34        /* Move quote charater in to B */
29760   STRLT3: MOV     D,B         /* Before moving it to D */
29780   STRLT2: PUSH    H           /* Save character start pointer on stack */
29800           MVI     C,255       /* Start character counter at max length */
29820   STRGET: INX     H           /* Move to the next character (first if from above) */
29840           MOV     A,M         /* Character to A */
29860           INR     C           /* Count the character */
29880           ORA     A           /* Check if null (end of line) */
29900           JZ      STRFIN      /* Null? We're done */
29920           CMP     D           /* Check for quote */
29940           JZ      STRFIN      /* Quote? We're also done */
29960           CMP     0           /* Check for other quote */
29980           JNZ     STRGET      /* No? Still in string, go up and get more */
... 9 more lines in this routine ... 

The first four lines of the routine have a separate label name to capture and adjust the register states prior to doing the real work. Those 4 labels are referenced 7 times throughout the program saving 19 lines of code per entry over a simple compiled program. This pattern is repeated many times throughout the code in the following routines: DVAR, FRETMP, GARBAG, INLIN, INITDX, LEFT$:, LINPRT, READY, and many others

Optimized Fall-through

Unoptimized compiled assembly often builds self-contained routine blocks that call and return from each other. However, if we look closely at routines and discover that one function almost always follows another, we can skip branching instructions and simply fall into the subsequent routine, saving a branching instruction.

Often this technique is paired with one or more of the other techniques so we'll move on to a specific example in the next section that demonstrates both

Here are cases within the code where work is performed by other functions:

  • ABS of the FAC falls in to NEG for negative cases
  • GOSUB falls in to GOTO
  • Floating addition falls in to normalize for negative cases
  • Floating normalize falls in to zeroing FAC if the answer is zero
  • Rounding falls in to overflow if it overflows
  • Floating multiplication underflows fall in to FAC zero

Functional Identities

In some cases an existing function can be used as the workhorse for other functions with some slight modifications. Consider how Altair BASIC implements Cosine:

65900   SUBTTL  SINE, COSINE AND TANGENT FUNCTIONS
65980   COS:    LXI     H,PI2      /* Get PI/2 constant */
66000           CALL    FADDS      /* Add it to the input value */
66020                              /* Falls directly into sine */
66220   SIN:    CALL    PUSHF      /* Sine entry point
... Over 60 lines follow in the SIN routine ...

The cosine function offsets the input value to make it equivalent to the sine function. Afterward, we must call sine, therefore the cosine function always falls directly in to sine.

Another easy example is that subtraction is simply negative addition. So if we negate the second oprand, then we only need to add. Consider the Double Precision subtraction routine:

31840   DSUB:   CALL    NEG        /* Negate second argument */
31860                              /* Fall into DADD */
31960   DADD:   LXI     H,ARG      /* Entry point for DADD */
... over 50 lines of DADD follow which falls in to DNORM ...

Other operations where the bulk of the work is performed by a base operation:

  • TAN is implemented by both COS and SIN
  • Square roots immediately fall in to exponentiation

Platform-specific Manipulation

We discussed earlier that Altair BASIC disables interrupts and repurposes them for common operations. This allows invocation with a CPU instruction (RST) macro'd to a friendly name. It follows that interrupt handlers can use optimized fall-through, for instance:

Interrupt 1 (SYNCHK) automatically falls through to the next handler (CHRGET) since these two operations are inherently related, allowing us to avoid manual invoke of the second routine. In total, this fall-through is employed 27 times in the code, savings at least that many branch instructions in memory (across all versions, 4/8/12K)

Another important factor is the code organization. By placing optional code and initialization code at the end of the program, this space can be recovered at run time and reallocated to user programs. While this is absolutely critical for Altair BASIC to run on the platform, these techniques were standard practice in the memory-cosntrained era of the 1970s and 80s.


Fun stuff

While reading through the source, a few interesting and fun things jumped out as reminders of the era that are worth a mention. In no particular order:

Fortran style comparisons
Altair BASIC comments and notes exclusively use original Fortran comparison dot notations common through the 1980s (Fortran 77). These are the period delimited operaors like .GT., .LT., .EQ., etc. A few exmaples:

11340                           IF THE STRING IS NOT NULL AND ITS POINTER IS
11360                           .GT.MINPTR AND .LT.FRETOP,
....
27660           COMPAR          ;SEE IF THIS IS .GT. ENTERING [H,L]

Fortran 90 introduced the modern operators (<=, ==, etc), but the dot operators are still supported even today.

Easter egg comments
Programming comments are always a great place to hide stuff. Altair BASIC is no exception:

36180   INLINC: CALL    INCHR           ;GET A CHARACTER
36184           CPI     7               ;IS IT BOB ALBRECHT RINGING THE BELL
36186           JZ      GOODCH>         ;FOR SCHOOL KIDS?
36200           CPI     13              ;IS IT A CARRIAGE RETURN?

It's interested to see the Bob Albrecht reference before he split off on the Tiny BASIC project - effectively undercutting the commercial value of Altair BASIC. The first meetings of the Homebrew Computer Club happened while the first lines of Altair BASIC were being written. He was already well known in the 'hacker' community for his newsletters and as an early Altair adopter, but I'm not aware that any of the Micro Soft trio having directly dealt with him when this version was written. If you're interested in this era, I recommend researching the influence of the HCC, it's members, and the tensions between software philosophies that arose in the mid-1970s.

Here's a comment from (probably) Davidoff to Gates in the math package:

24800   DCXBRT: DCX     B>              ;THIS IS FOR BILL.  C WILL NEVER BE ZERO
24820                                   ; (THE MSB WILL ALWAYS BE ONE) SO "DCX  B"
24840                                   ; AND "DCR      A" ARE FUNCTIONALLY EQUIVALENT
24860           RET                     ;ALL DONE

I assume Gates held a high standard for comments since Davidoff went out of his way to notate almost every line in the math package.

Incomplete? Bugs?
Altair BASIC 3.0 was a work in progress and I doubt that these printouts were release ready. Since files F3 and F4 were printed weeks apart with different configs (12K vs 8K), there are inconsistencies with symbols and code paths. Some obvious syntax errors exist and would prevent a build. There are also a few semantic errors that would lead to unintended side affects. Consider these:

53240   FOUTDN: MVI     M,0             ;PUT A ZERO AT THE END OF THE NUMBER
53260           XCHG                    ;SAVE THE POINTER TO THE END OF THE NUMBER
53280                                   ; IN (DE) FOR FFXFLV
53300           LXI     H,FBUFFR+1      ;GET A POINTER TO THE BEGINNING
53320           RET                     ;ALL DONE

... ~300 lines later ...

59820   FOUTNV: PUSH    D               ;SAVE (DE)
59840           LDA     VALTYP          ;GET WHAT KIND OF VALUE WE HAVE
59860           CPI     4
59880           JNZ     FOUTND          ;WE HAVE A DBL

The problem above is that neither the label FOUTDN in the top block nor FOUTND in the bottom block appear anywhere else in the code. This wasn't found during load time because the code path is dead. Recall that the math package in the code listing is built at 8K, but these extended routines are 12K+ only. This was likely just a typo and FOUTDN (Floating output DONE) is the intended label so like 59980 should change from FOUTND to FOUTDN.

Multiple lines of the code listing have corrections penciled in showing that this is was probably a working draft for off-prem code review. You can't take that PDP home with you!

50060           RZ
50080           CMP     C          /* Changed from CMP B */
50100           RZ

This block is from the 4K version of DATA and REM commands. This comparison was changed from B to C with pencil but header notes indicate that it was originally B and they decided to use register C at a late time.

01100   FBUFFR: BLOCK   ^D13            ;BUFFER FOR FOUT
01120   IFE     LENGTH-2,<BLOCK ^D<35-13>>  /* Originally ^D<31-13> */

From the math package, they penciled in this extended buffer resize. Likely this was a design change */

45980           PUSH    B               ;PUT DUMMY FIELD LENGTHS ON STACK
46000           JMP     STROUI##     /* Originally JMP FOUT? */

Also from the math package. A pencil change from one routine to another.

PDP-10 code
We know that development of Altair BASIC began on a PDP-10 so it should be no surprise that this early version would include a trace amount of platform code:

44360   IFE     REALIO,<              /* This code is only live if REALIO is off - not a real Altair */
44380   DDT:    POP     B             /* Clean the stack */
44400           HRRZ    14,JOBDDT##   /* Load the debugger address to register 14 (stack pointer by convention)*/
44420           JRST    0(14)>        /* Jump to it */

This code block is not applicable to interpreter operation

FAQ

Will this code build and execute as-is?

Short answer: No.

The printouts provided by Gates for the 50th anniversary release show two source files (F3 and F4), from two different configurations, 12K and 8K respectively. If you attempt to run these together, both will operate with wrong assumptions about memory layout, specifically the size of the Floating Point Accumulator. The extended portions of the math package also won't define all of the math symbols required by the interpreter section.

Then there's the problem of missing dependencies, notably the "MCS808" header.

What would it take to build this code?

At the very least, you'll need to:

  • Get Macro Assembler (or build your own) to handle the Macro-specific directives
  • Locate the missing universal file "MCS808" referenced in the source (or make your own)
    • You just need to resolve the missing symbols. Stub them or recreate them. Check out known dissasemblies or back the missing data out of an Altair BASIC tape
  • Change the F4 common file to 12K version. (Line 00400 Length from 1 to 2) to ensure the math package matches the interpreter.
    • Easier said than done, since the 12K floating point output routine seems incomplete
    • Alternatively, you could downgrade F3 or both to a lower version
  • Double check errors in my source transcription with the printout. While I believe the transcription is over 99% correct -- that's also known as "incomplete"
  • Find an 8080 emulator to test the build - or a real Altair if you want the full experience!

Some dead code sections, especially in the math package, have unresolved symbols. It's not clear if these sections were permanently cut out or the devs were in the middle of significant changes. The Floating Point Output routines for 12K are especially problematic - but again, this printout shows only the 8K as 'live' so no errors are thrown.

There's more quirks that you'll need to battle through, but those steps address the high-level approach

How did you extract the code from the printouts?

75% script automation and 25% "the hard way". No generative AI or OCR.

Script automation generated the line numbers, headers, footers, and basically any element that was predictably repeatable.

I transcribed most of the remaining by hand over roughly 40 hours. This was combined with my code reading and some quick analysis with notes which provided the outline for this discussion.

Is there any way I can contribute?
I didn't really plan for this to be a 'living' project, but there is certainly room to clean up my transcription in the GitHub repo. The intent is a perfect line-by-line copy of the code listing printouts Bill Gates provided.

As I've said already, I'm confident the source transcript is 99% correct, but I wouldn't turn down a pull request with more corrections to the main listing (ALTAIRBASIC30.LST). I prioritize the corrections like this:

  • Build-blocking corrections - assembly instructions and operands
  • BASIC line numbers, because order matters
  • Comment sections, because this is really about understanding
  • Symbol table corrections
  • Machine code corrections
  • Page headers or anything else

Finally, if anyone has a better perspective about the discussion above, I'll be glad to update and add credits.

More questions added as they roll in