Atari BASIC

This article is about BASIC on the 400/800/XL/XE computers. For the BASIC bundled with Atari ST computers, see Atari ST BASIC. For the BASIC cartridge for the Atari 2600/VCS console, see BASIC Programming.
Atari BASIC
An Atari BASIC program ready to run
Original author(s) Paul Laughton
Kathleen O'Brien
Developer(s) Shepardson Microsystems
Initial release 1979 (1979)
Stable release Revision C / 1983 (1983)
Platform Atari 8-bit family
Type BASIC
License Copyright © 1979 Atari Inc. Proprietary

Atari BASIC is a BASIC interpreter that shipped with the Atari 8-bit family of 6502-based home computers. The language was originally on an 8 KB ROM cartridge. On the XL/XE computers it is built-in and can be disabled by holding down the OPTION key while booting. The XEGS disables BASIC if powered without the keyboard attached.

The complete commented source code and design specifications of Atari BASIC were published as a book in 1983.[1]

Background

The machines that would become the Atari 8-bit family had originally been developed as second-generation games consoles intended to replace the Atari 2600. Ray Kassar, the new president of Atari, decided to challenge Apple Computer by building a home computer instead. This meant Atari needed the BASIC programming language, then the standard language for home computers.

Faced with the need for a BASIC interpreter, Atari did what many of the other home computer companies did and purchased the source code to the MOS 6502 version of Microsoft 8K BASIC. The original 8K BASIC referred to its memory footprint when compiled on the Intel 8080's instruction set. The lower code density of the 6502 expanded the code to about 9 kB. This was slightly larger than the natural 8 kB size of the Atari's ROM cartridges.

Atari felt that they also needed to expand the language to add better support for the specific hardware features of their computers, similar to what Apple had done with their Applesoft BASIC. This increased the size from 9 kB to around 11 kB. Atari had designed their ROM layout in 8 kB blocks, and paring down the code from 11 to 8 kB turned out to be a significant problem. Adding to the problem was the fact that the 6502 code supplied by Microsoft was undocumented.

Six months later they were almost ready with a shippable version of the interpreter. However, Atari was facing a deadline with the Consumer Electronics Show (CES) approaching, and decided to ask for help to get a version of BASIC ready in time for the show.

Shepardson Microsystems

The 8K ROM Atari BASIC cartridge for Atari 8-bit computers.

In September 1978 Atari asked Shepardson Microsystems (SMI) to bid on completing BASIC. Shepardson had written a number of programs for the Apple II family, which used the same 6502 processor, and were in the middle of finishing a new BASIC for the Cromemco S-100 bus machines (Cromemco 32K Structured BASIC). SMI examined the existing work and decided it was too difficult to continue paring it down; instead they recommended developing a completely new version that would be easier to fit into 8K. Atari accepted the proposal, and when the specifications were finalized in October 1978, Paul Laughton and Kathleen O'Brien began work on the new language.

The result was a rather different version of BASIC, known as ATARI BASIC. In particular, the new BASIC dealt with character strings more like Data General's BASIC than Microsoft's, Microsoft used strings similar to those from DEC BASIC.[N 1]

The contract specified a delivery date on or before 6 April 1979 and this also included a File Manager System (later known as DOS 1.0). Atari's plans were to take an early 8K version of Microsoft BASIC to the 1979 CES and then switch to the new Atari BASIC for production. Because of a bonus clause in the contract, development proceeded quickly and an 8K cartridge was available just before the release of the machines. Because Atari BASIC was delivered before Microsoft BASIC, Atari took it with them to the CES.

Releases

Shepardson's programmers found problems during the first review and managed to fix some of them, but Atari had already committed ATARI BASIC to manufacturing, and the technology of the time did not permit changes. So it was manufactured with known bugs, and became known (as a retronym) Revision A.

A BASIC programmer can find out the version by examining a well-known location in memory. Entering the command PRINT PEEK(43234) at the READY prompt will give a result of 162 for Revision A, 96 for Revision B, and 234 for Revision C.

Description

Program editing

Atari BASIC uses a line editor, like most BASICs of the era. Unlike many BASICs, however, Atari BASIC immediately checks the line for syntax errors as soon as the Enter key was pressed. If a problem is found it re-displays the line, highlighting the text near the error in inverse video. This can make catching syntax errors on the Atari much easier than on other editors; most BASICs will not display the errors until the program is executed.[N 2]

When not running a BASIC program, the Atari is in direct mode or immediate mode. Program lines can be entered by starting with a line number, which will insert a new line or amend an existing one. Lines without a line number are executed straight away, hence the name immediate mode.

When the programmer types RUN the program executes from the first statement.

Unlike most other BASICs, Atari BASIC allows all commands to be executed in both modes. For instance most BASICs only allow LIST to be used in immediate mode, while Atari BASIC also allows it to be used inside a program. This is sometimes used as part of a way to produce self modifying code.

Program lines ("logical lines") can be up to three screen lines ("physical lines") of 40 characters, so 120 characters total. The cursor can be moved freely in these lines, unlike in other BASICs where to get "up" a line one has to continuously scroll leftwards until the cursor is wrapped at the left margin (and similarly to go down when wrapping at the right margin) though that works too, except the cursor when wrapping left to right or right to left does not move up or down a line. The OS handles tracking whether a physical line flowed to the next on the same logical line; the three-line limit is fairly arbitrary but keeps lines below 128 characters and so reduces the chances of buffer overflow.

The cursor can always be moved freely around the screen, and it will wrap on all sides. Hitting "ENTER" will send the tokenizer the (logical) line on which the cursor sits. So, in the example pictured above (with "PRUNT"), all the author needs to do to fix the error is move the cursor over the "U", type I (the editor only has an overwrite mode) and hit Enter.

This is quite a frequent editing technique for, say, renumbering lines. Atari BASIC has no built-in renumbering command, but one can quickly learn to overwrite the numbers on a set of lines then just hit "ENTER" repeatedly to put them back into the program. Indeed, a slightly cryptic but essentially simple idiom allows this to be done by the program itself as it is running, producing self-modifying code. This is not an artifact or cheat around the system but inherent in the combined behavior of the editor and tokenizer.

Character set

Main article: ATASCII

The Atari variation on ASCII, called ATASCII, has 128 (8016) characters mostly corresponding to ASCII but with a few exceptions. All characters have printable forms, unlike ASCII where codes 0-31 (0-1F16) are "control codes" that perform special functions such as requesting a paper feed or ringing an attention bell. Characters 128-255 (8016-FF16) are displayed as the inverse video of characters 0-127 (0016-7F16). Variable names must be composed of upper-case alphabetic (65-90, 4116-5A16) and numeric (48-57, 3016-3916) characters, starting with an alphabetical character, and for strings terminating with a dollar sign (36, 2416).

The character set has a full ensemble of lower case characters and some graphics characters, but programming in Atari-supplied languages (BASIC, Assembly, PILOT, Atari Pascal, Atari LOGO) and third-party languages such as COBOL is done exclusively in upper case. Lower case letters are not recognised by most compilers on the Atari; lower case characters can only be used within string manipulation and display, and in REMarks. This is in line with pre-1990 era programming convention. Most 6502-based machines (including 6502-based Apple II series and Ohio Scientific computers), punched card systems and earlier versions of the EBCDIC character space did not have or use lower case. Legibility was also a concern with the font squeezing ascenders and descenders into the 8×8 fixed grid which define each display glyph and the 8×7 fonts used on many dot matrix printers such as Atari's own.

The ANTIC chip uses one byte to indicate the "start page" of a font, thereby dividing memory into of 256 pages of 256 bytes. But only one font can be used at a time without machine code display list interrupts to change the font midway down the screen. An infrequently used 8×10 font mode exists, where the range of characters for lower case letters are shifted down two lines thus allowing the actual glyphs to be 8×8 yet be presented with ascenders or descenders. This text mode is only used occasionally, partly because of the odd vertical size on screen and different order of bytes in the glyph. The ease of implementing fonts means many are freely available, with font editors and so forth.

The tokenizer

Like most BASIC interpreters, Atari BASIC uses a token structure to handle lexical processing for better performance and reduced memory size. The tokenizer converts lines using a small buffer in memory, and the program is stored as a parse tree.[N 3] The token output buffer (addressed by a pointer at LOMEM 80, 8116) is one page (256 bytes) long, and any tokenized statement that is larger than the buffer will generate an error (14  line too long). Indeed, the syntax checking described in the "Program editing" section is a side effect of converting each line into a tokenized form before it is stored.

The output from the tokenizer is then moved into more permanent storage in various locations in memory. A set of pointers (addresses) indicates these locations: variables are stored in the variable name table (pointed to at VNTP 82, 8316) and the values are stored in the variable value table (pointed to at VVTP 86, 8716). Strings have their own area (pointed to at STARP 8C, 8D16) as does the runtime stack (pointed to at RUNSTK 8E, 8F16) used to store the line numbers of looping statements (FOR...NEXT) and subroutines (GOSUB...RETURN). Finally, the end of BASIC memory usage is indicated by an address stored at MEMTOP 90, 9116) pointer.

By indirecting the variable names in this way, a reference to a variable needs only two bytes to address its entry into the appropriate table; the whole name does not need to be stored each time. This also makes variable renaming relatively trivial if the program is in storage, as it is simply a case of changing the single instance of its name in the table and the only difficulty is if the name changes length (and even then, only if it gets longer): indeed, obfuscated code can be produced for a finished program by renaming variables in the name tables possibly all to the same name. This doesn't confuse the interpreter since internally it is using the index values not the names. Of course, new code will be difficult to add because the tokenizer has to search the name table to find a variable's index, and can get confused if names are not unique (though it is OK to have names in both the "string" and "variable" namespaces, e.g. HELLO = 10 and HELLO$ = "WORLD", because they have separate tables, that is to say, separate namespaces.)

Atari BASIC uses a unique way to recognize abbreviated reserved words. In Microsoft BASIC there are a few predefined short forms, (like ? for PRINT and ' for REM). Atari BASIC allows any keyword to be abbreviated using a period, at any point in writing it. So L. will be expanded to LIST, as will LI. and (redundantly) LIS.. To expand an abbreviation the tokenizer will search through its list of reserved words and find the first that matches the portion supplied. To improve the chance of a programmer's correctly guessing an abbreviation, to save typing, and to improve the speed of the lookup, the list of reserved words is sorted to place the more-commonly used commands first. REM is at the very top, and can be typed in just as .. This also speeds lexical analysis generally, since although the time to search is in theory proportional to the length of the list, in practice it will find common keywords very quickly, to the extent that good programmers know when a line is syntactically incorrect even before the parser says so, because the time taken to search the list to find it wanting gives an indication that something is wrong.

So whereas Microsoft BASIC uses separate tokens for its few short forms, ATARI BASIC has only one token for each reserved word when the program is later LISTed it will always write out the full words (since only one token represents all possible forms, it can do no other). There are three exceptions to this: PRINT has a synonym, ?; GOTO has a synonym, GO TO; and LET has a synonym which is the empty string (so 10 LET A = 10 and 10 A = 10 mean the same thing). These are separate tokens, and so will remain as such in the program listing.

Some other contemporary BASICs have variants of keywords that include spaces. Atari BASIC has only one of these, the aforementioned GO TO. The other exception here is keywords for communicating with peripherals (see the "Input/Output" section, below) such as OPEN # and PRINT #; it rarely occurs to many programmers that the " #" is actually part of the tokenized keyword and not a separate symbol; and that for example "PRINT" and "PRINT #0" are the very same thing,[N 4] just presented differently. It may be that the BASIC programmers kept the form # to conform with other BASICs (the syntax originally derives from Fortran), though it is entirely unnecessary, and probably a hindrance, and is not used in other languages for the Atari 8-bit family.

Expanding tokens in the listing can cause problems when editing. The Atari line input buffer is three lines (120 characters); up to three "physical lines" make one "logical line". After that a new "logical line" is automatically created. This doesn't matter much for output but it does for input, because the operating system will not return characters to the tokenizer after the third line, treating them as the start of a new "logical line". (The operating system keeps track of the mapping between physical and logical lines as they are inserted and deleted; in particular it marks each physical line with a flag for being a continuation line or a new logical line.) But using abbreviations when typing in a line can result, once they have been expanded on output, to a line that is longer than three lines and, a more minor concern, some whitespace characters can be omitted on input, so for example PRINT"HELLO" will be listed as PRINT "HELLO", one character longer. If one then wants to edit the line, now split across two logical lines, one must replace the expanded commands back with their abbreviations to be submit them back to the tokenizer.

Literal line numbers in statements such as GOTO are calculated at run time using the same floating-point mathematical routines as other BASIC functions. This calculation allows subroutines to be referred to by variables: for instance GOTO EXITOUT is as good as GOTO 2000, as long as one first sets EXITOUT=2000 somewhere in the code. This is much more useful than it might sound; literals are stored in the 6-byte floating-point variable format, but variables are stored as a two-byte pointer to their place in the variable value table, at VVTP. If one GOTO or GOSUBs to a given line from multiple locations in the program, which is quite common in BASIC, these memory savings can add up. The only downside is that the branch has to look up the number in the VVTP, which adds slightly to the processing time. Using variables in branches also means that if the programmer is careful always to use the variable and not the literal, subroutines can be easily renumbered (moved around in the program), because only the variable value needs to be changed.[N 5]

String handling

Atari BASIC differs dramatically from Microsoft-style BASICs in the way it handles strings. In BASICs following the Microsoft model, strings are special types that allow for variable length and various string operations. Atari BASIC has no strings of this sort, instead using arrays of characters, rather like Fortran. This allowed the BASIC language programmers to remove all the special-purpose code needed for handling dynamic resizing of strings, reusing instead the code already being used to handle arrays of numbers. A string is allocated a maximum size using the DIM statement, although its actual length can vary at runtime from 0 to this maximum size.

Of course, strings are not used by end programmers in the same way as arrays of numbers at least not normally so Atari BASIC also includes a selection of commands for "slicing" up arrays. A$ refers to the entire string, whereas A$(4,6) "slices" out the three characters at locations 4, 5 and 6. In theory, this is a more elegant solution than Microsoft BASIC's LEFT$, MID$, and RIGHT$ solution, as this syntax replaces three separate commands with a single one.

Although this simplification reduces the size of Atari BASIC and offers some theoretical performance benefits, it is also a hindrance to porting BASIC programs from other computers to the Atari. When the Atari was first produced it was the norm for programs to be provided as listings in magazines for programmers to type in. They would have to scan them for instances of LEFT$, RIGHT$ and so on, do some mental arithmetic and replace them with slicing commands. Because strings were allocated a fixed size it generally means that programmers will pessimize or guesstimate the likely maximum size, allocating, perhaps 256 bytes for a string that only ever stores someone's first name.

Strings in Atari BASIC cannot themselves be members of arrays, so arrays of strings have to be implemented by the programmer. Strings can move around in memory, so it is not generally possible for example to store their memory addresses in an array. For short strings of approximately the same length, instead an array is generally built using padding so that the strings are all the same length and the nth string in the array is n×l characters into it, where l is the length of the string.

Like most versions of BASIC, strings were not initialized with a default value like "empty", and care had to be taken not to interpret random data in RAM as part of the string. In other languages clearing out the old data usually required for loops, but Atari BASIC enabled fast string initialization with the following trick:

REM The following initialize A$ with 1000 characters of x
DIM A$(1000)
A$="x":A$(1000)=A$:A$(2)=A$

Input/Output

CIO overview

The Atari OS includes a subsystem for peripheral device input/output (I/O) known as CIO (Central Input/Output). All I/O went through a central point of entry (E45C16) passing the address of an I/O Control Block (IOCB), a 16-byte structure that defines which device was meant, and what kind of operation (read, write, seek etc.). There are 8 such IOCBs, allocated at fixed locations in page 3 of memory from 38016 to 3FF16.

Most programs therefore can be written independently of what device they might use, as they all conform to a common interface this was very rare on home computers when Atari BASIC was first made. virtual devices such as the screen, S: and the editor, E: did have special operations, for example to draw graphics or to ask for line input,[N 6] but these were done in a uniform way and new device drivers could be written fairly easily that would automatically be available to ATARI BASIC and indeed any other program using the Atari OS, for example to provide support for new hardware devices such as mouse pointers, or software devices such as an 80-column display (using typically a 4×8 pixel font). Existing drivers could be supplanted or augmented by new ones since the driver table was searched newest-to-oldest, so a replacement E:, for example could displace the one in ROM to provide an 80-column display, or to piggy-back on it to generate a checksum whenever a line was returned this technique is used for some of the program listing checkers that provide a checksum for each line.

CIO access in BASIC

Atari BASIC supports CIO access with reserved words OPEN #, CLOSE #, PRINT #, INPUT #, GET #, PUT #, NOTE #, POINT # and XIO #. There are routines in the OS for graphics fill and draw, but they are not all available as specific BASIC keywords. PLOT and DRAWTO for line drawing are supported while a command providing area fill is not. The fill feature can be used through the general CIO entry point, which is called using the BASIC command XIO.

Up to eight IOCBs can be in use at a time, numbered 0 through 7 (0 was, by default, the editor E:). The BASIC statement OPEN # was used to prepare a device for I/O access:

REM Opens the cassette device on channel 1 for reading in BASIC
OPEN #1,4,0,"C:MYPROG.DAT"

Here, OPEN # means "ensure channel 1 is free" (an error otherwise results), call the C: driver to prepare the device (this will set the cassette tape spools onto tension and advance the heads keeping the cassette tape player "paused"; the 4 means "for read" (other codes were 8 for write, 12 = 8 + 4 for "read-and-write", and so forth), and the third number provides extra auxiliary information, here not used and set by convention to 0. The C:MYPROG.DAT is the name of the device and the filename, as it happens, files on cassette were not named by this device. The string gives the device name and optionally a filename. Physical devices can have numbers (mainly disks, printers and serial devices), so "P1:" might be the plotter and "P2:" the daisy-wheel printer, or "D1:" may be one disk drive and "D2:" another, "R1:" may be a modem and "R2:" an oscilloscope (R for RS-232, provided by an add-on interface and not built into the OS), and so on; if not present, 1 is assumed.

Reserved IOCBs in Atari BASIC

ATARI BASIC disallows access to IOCB 0 (the editor, E:), although it could be accessed as IOCB 16 (which didn't exist but the code wrapped the address back to 0), and reserves IOCB 7 for printing and cassette operations using the built-in commands LPRINT, SAVE, LOAD, CSAVE, CLOAD, though there is nothing to stop printers or the cassette being used on other channels too. IOCB 6 is used for accessing the graphics screen device (S:) for drawing lines, filling shapes and so on. SAVE and LOAD output the compact tokenized form of the BASIC program, LIST and ENTER output and input the text source, just as if they were being sent to or from the editor.

For the other CIO functions, Atari BASIC uses the XIO statement. This just primes an IOCB and calls the CIO entry point; any of the other commands (PRINT, INPUT and so on) can be achieved with the more general form XIO.

But the form of XIO is not very friendly for BASIC users, and it is mostly used for unusual functions that are specific to a particular device. For example, an M: device exists called "Multi-Mouse"[3] that allows an Atari ST mouse, 8-bit trakball, touch tablet, or joystick, to be treated as a device whereby the position of the mouse cursor is set or got with NOTE and POINT commands. It should be remembered here that POINT does not mean, as it does in many BASICs, draw a point on the screen, but point an IOCB channel at a specific place in a file. In Atari DOS the two parameters to NOTE and POINT are the disk sector and offset, which is not very portable. In SpartaDOS they make up the offset from the start of the file. In the Multi-Mouse driver M:, they are the X and Y position of the mouse cursor.

Error Handling

I/O routines returned error codes of 128-255 (8016-FF16) via the processor's Y register and setting the carry flag of the processor. Setting the carry flag is a neat trick since the caller can immediately branch-on-carry (BCC or BCS instructions) to an error routine, a brief, quick and relocatable 6502 instruction (2 bytes, 2 cycles), without having to test Y for the (we hope) normal case where there is no error.

As with other aspects of the CIO, error codes were common across devices but could be extended for particular devices. Error handlers could thus be written quite generically, to fail gracefully, maybe put out a message, ask the user whether to retry, propagate the error, and so on.

There were no user-friendly messages for standard error codes in the OS itself. They would be interpreted by the application.

Atari BASIC (and other languages) thus had the freedom to return error codes less than 128, and these meant different things in different languages. There was nothing to stop a perverse implementer using error codes of 128 or above, but no incentive to do so.

Graphics and sound support

In comparison to the BASICs of some competing machines at the time, Atari BASIC had good built-in support of sound, (SOUND statement), graphics (GRAPHICS, SETCOLOR, COLOR, PLOT and DRAWTO) and peripheral units like joysticks (STICK, STRIG) and paddles (PADDLE, PTRIG). Other home computer users were often left with cryptic PEEKs and POKEs for such programming.

The lack of a FILL command is a notable omission considering that the routine, however primitive, was available in the operating system. It could be achieved with the general-purpose XIO command, but was rather fiddly:

REM The co-ordinates of the corners of the fill quadrilateral have to be set up
REM before calling XIO, using POKE into the IOCB. This is quite a trick because
REM it's not easy to find out where the IOCB is. Anyway, then we do:
XIO 18,#6,12,0,"S:"
REM XIO # = Extended IO.
REM 18 = Fill (17=Drawto).
REM #6 = On Channel 6, mapped to the graphics screen device.
REM 12 = Read/write.
REM 0 = Redundant (unused).
REM "S:" = Logical device, used only for OPEN and some disk
REM commands with a target such as for a RENAME
REM Redundant here but used by convention

Some advanced aspects of the hardware such as sprites were not supported at the language level. The lack of access to timers made sound programming difficult, particularly because North American machines ran on different clock speeds from the rest of the world, because they were tied to the speed of the television system.

Performance

Running on the original equipment, Atari BASIC is slower than other BASICs on contemporaneous equipment for the same home market, sometimes by a surprising amount, especially when one takes into account the fact that the Atari's CPU was clocked almost twice as fast as that of most other 6502-based computers of that era. Most of these problems stemmed from two particularly poorly implemented bits of code.

One is a side effect of how Atari BASIC recalculates line numbers as the program is run. This means that a GOTO has to run a small amount of additional code in order to find the line to jump to.[N 7] This would normally be a minor issue, but the same code is also used to implement NEXT in a FORNEXT loop, so it dramatically lowers performance of these very common loops (indeed, the only loop structure in Atari BASIC). It is obvious that a line number less than 6553610 (1000016) can be stored in a 16-bit unsigned integer, but presumably the designers chose to store it as floating point for other reasons.

Atari BASIC does not do well with integer variables; all numbers are stored as floating point. Atari BASIC relied on the Atari OS's built-in floating point routines (BCD notation), which are relatively slow compared to other representations, even on the same hardware. But most of the slowness of the implementation lies in a particularly poor implementation of the multiply subroutine used throughout the math libraries. This is really not a problem of the language itself but of the underlying OS, but it adds to the general poor performance. More spectactularly, really, the fact that simple integer operations are converted back and forth to floating point really highlights the flaw, especially considering that the Atari's best features rely on special hardware (for graphics, sound and so on) that deals purely in integers: bytes or two-byte words. There is not even in Atari BASIC an easy way to perform bitwise operations.

The MOS 6502 processor had a special mode for dealing with BCD (the SED and CLD instructions to treat each 4 bits of a byte as a BCD digit), and perhaps that was particularly attractive to the designers for implementing floating point as BCD. The now almost universal IEEE 754 standard of representation of floating point numbers was still at the design stage when the Atari 8 bit family and its contemporaries first came to market, so the design of an FP implementation was very much up to the OS or BASIC designer.

Several commercial and shareware BASICs were available on the platform that addressed some or all of these issues, resulting in performance that was 3 to 5 times faster than the Atari version. Using these BASICs, the Atari was one of the fastest home computers of its era.

Atari later sold a diskette-based version of Microsoft BASIC, Atari Microsoft BASIC, and later managed to fit it onto a cartridge as well, but no compiler or runtime was available for redistribution.

Advanced techniques

Subroutines

Atari BASIC has no implementation of subroutines, or rather, it does not have a concept of local variables. In Fortran terminology, all variables are COMMON.

But programmers can simulate user functions because of the way the GOSUB command can reference a variable. For example, a programmer could start a subroutine at line 10000 and have the program initialize a variable with that number, e.g. LET TEST = 10000. The calling code can then initialize some mutually understood variables and use the statement GOSUB TEST to invoke the subroutine. The subroutine starting at line TEST can then do its operation on the predetermined variables and put return results into variables available after RETURN.

By extension, if the two agree on two variables, an array called, say, STACK and a numeric variable called STACKTOP, then a stack can be implemented in software whereby local variables are pushed and popped to the stack and so implement local variables. For example:

10 DIM STACK(100)
20 STACKTOP = 0
35 REM LINE NUMBERS OF SOME FUNCTIONS FOLLOW
40 FACTORIAL = 8000
60 PUSHSTACK = 2100
70 POPSTACK = 2200
75 REM LET'S COMPUTE EIGHT FACTORIAL
80 LET STACKVALUE = 8: GOSUB PUSHSTACK
90 GOSUB FACTORIAL
100 GOSUB POPSTACK
110 PRINT "EIGHT FACTORIAL IS "; STACKVALUE
120 END
2099 REM PUSHSTACK SUBROUTINE
2100 STACK(STACKTOP) = STACKVALUE: STACKTOP = STACKTOP + 1: RETURN
2199 REM POPSTACK SUBROUTINE
2200 STACKTOP = STACKTOP - 1: STACKVALUE = STACK(STACKTOP): RETURN
7999 REM FACTORIAL SUBROUTINE
8000 GOSUB POPSTACK
8010 IF STACKVALUE <= 2 THEN GOSUB PUSHSTACK: RETURN
8020 FACTORIAL = STACKVALUE
8030 STACKVALUE = STACKVALUE - 1: GOSUB PUSHSTACK
8040 GOSUB POPSTACK: FACTORIAL = FACTORIAL * STACKVALUE
8050 RETURN

Programmers may notice that line 8010 can be optimized because a GOSUB followed by a RETURN is the same as a GOTO, because the subroutine will do the RETURN for us:

8010 IF STACK VALUE <= 2 THEN GOTO PUSHSTACK

This is of course an example of tail call optimization.

Includes

Because Atari BASIC can read in lines of code from any device, not just the editor, it is possible to save blocks of code and then read them in and merge them into a single program just as if they had been typed into the editor. Of course this means the lines being read in must have line numbers that are not used in the main program. The code to be merged is written to a device as text using the LIST command, and can be put back into the program with the ENTER command. So the stream of text on the device is, from the BASIC interpreter's point of view, no different from that had it been typed into the editor.

By carefully using blocks of line numbers that do not overlap, programmers can build libraries of subroutines (simulating functions as above) and merge them into new programs as needed.

Embedded machine language

Atari BASIC does not have a built-in assembly language processor. Machine code is generally stored as bytes in strings. Machine code functions are invoked from Atari BASIC with the USR statement, which works in much the same way as GOSUB, but with fewer guarantees.

String variables can hold any of the 256 characters available in the ATASCII character set and thus each byte of memory reserved for a string variable can hold any number from 0 to 255, including the characters 3410 (2216, "quote") and 15510 (9B16, "ENTER"), although these are tricky to type in. Short relocatable 6502 machine language routines can be converted to ATASCII characters and stored in the string variable.

The machine language routine is called using the USR function specifying the address of the string variable as the location in memory to execute followed by optional parameters that will be passed to the routine. For example, if the machine language code is stored in a string named ROUTINE$ it can be called with parameters as ANSWER=USR(ADR(ROUTINE$),VAR1,VAR2).

Parameters are pushed onto the hardware stack (in Page 1) as 16-bit integers in the order specified in the USR() function in low byte, high byte order. The last value pushed to the stack is a byte indicating the number of arguments. Even if no parameters are used the machine language code must pull the argument counter off the hardware stack before returning via RTS. The 16-bit parameters are pulled from the stack in order of high byte, low byte.

The machine language routine can return a value to the BASIC program. The return value is placed in addresses 21210 and 21310 (D416 and D516) as a 16-bit integer which will be converted to a BCD value and placed in the return variable. It cannot be pushed to the stack as there is no concept of a stack frame, and for the same reason there is no concept of a void return, but typically if the machine code subroutine does not return anything useful, the value is just ignored by the caller.

These routines are usually relocatable machine code. Though, if the code is assembled to a specific address that does not conflict with the BASIC program, or absolute addresses can be recomputer before execution then non-relocatable code is permitted. Otherwise, relocatable code does not use instructions like JMP or JSR that use absolute addresses within the routine itself. Calling well-known addresses in the OS is permitted. The code can only use branch instructions such as BCC (branch if carry clear) which jump backwards or forwards by roughly 12810 (8016)[N 8] because the strings could be moved in memory. For this reason page 6 (06001606FF16), a page of memory not used by BASIC or the operating system, is very popular for storing small routines; but of course one runs the danger that another routine may also wish to be stored there.

On the 6502, relocation is not trivial. These days we expect programs to sit pretty much anywhere in memory; the loader and processor collaborate to make that happen. But microprocessors of that era did not do that. The 6502 was especially hindered by having very few indirection instructions, and those it had were asymmetric: the X and Y registers indirect in different directions. This leads either to rather clumsy code that is forever moving stuff between registers, or clever but obtuse code that keeps them where they need to be even if it would seem more obvious to stick something else there. The 6502 instruction set is small enough that, over a short time, programmers can model the entire processor in their heads, even down to knowing how many cycles each instruction takes, and then start making clever tricks.

As well as using machine code for advanced functions, fairly trivial USR routines are sometimes used simply to gain access to functions in the Atari OS that have not been provided through Atari BASIC: for example block serialization to and from devices (Atari BASIC only lets it be done byte by byte, with GET and PUT, which takes far longer for just shuffling back and forth through the OS layers than actually writing the one byte of data), or for reading and writing blocks of memory (the PEEK and POKE commands were also unnecessarily slow because of the numeric problems described above).

Machine code can also be stored as numbers in DATA statements. After character strings, DATA statements are the next most efficient for storage since the data values are stored as a string of characters as they appear in the code. This method is sometimes used for very short routines where size isn't important but ease of use is (no special loaders or clever typing routines are required), or for one-off programs that then write out the resulting block of bytes (probably stored in a string) is written out as a program that can be read in later byte-for-byte.

See also

Notes

  1. The main differences were whether strings were allowed to grow and shrink in size once memory had been allocated for them, and whether the size of the string was constant from the outset (e.g. being padded with some special character meaning "end of string") or whether the size was stored independently. Both approaches have advantages and disadvantages, depending on how one is expecting them to be used.
  2. The Sinclair ZX family of machines also adopts the approach of checking each line as it was entered, although it differs by not even allowing the line to be entered until it is syntactically correct, which can be a hindrance to the programmer when writing a line of code but wanting to look up something elsewhere in the program.
  3. Although Wilkinson implements the parse tree as a set of tables which is really an implementation detail.
  4. Although 0 is actually explicitly disallowed here by BASIC assuming it to be a coding error, isn't it?
  5. Because the literal line numbers are stored as floating point numbers, deliberate obfuscation can happen by changing them to values other than natural numbers. This is rarely useful, because the line number is rounded as part of the lookup.
  6. E: was pretty much a combination of S: and the keyboard, K:
  7. For added excitement, unlike in most other BASICs, if the line number is not found, execution continues at the lowest-numbered line higher than that specified rather than producing an UNDEFINED STATEMENT ERROR
  8. . Because the branch amount is one byte in two's complement arithmetic, but the instruction pointer onto which it is applied is already at the next instruction, so actually it can go -128 + 2 = 126 bytes back, or +127 + 2 = 129 bytes forward, from the current instruction, the 2 being the number of bytes the branch instruction takes. That only adds up to 245 places it could go, where did the other 11 go? Well, -2 +2 goes to the branch instruction again and so is an infinite loop; -1 goes to the -1 itself, which is not a valid instruction in the 6502 instruction set; +0 goes to the next instruction, and so is effectively a NOP but takes two cycles rather than 1. The other few small branches, similarly, might jump to an instruction or to part of an address, depending on whether the instructions before or after the branch are one, two or three bytes long.

References

Citations
  1. Wilkinson, Bill (1983), The Atari BASIC Source Book, Compute! Books, ISBN 0-942386-15-9
  2. 1 2 "Atari BASIC Bugs", Compute!, July 1986, pg. 10
  3. Published in Page 6 magazine, July 1986.
Bibliography

External links

This article is issued from Wikipedia - version of the Sunday, October 04, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.