# Musings on BASIC, Forth, ForthBox, and Other Things
This blog entry is going to seem like it rambles a lot. No, correction —
this blog article is going to ramble a lot. But, you should be used to
this by now. The topics touched are as follows (in no particular order):
1. My abysmal ForthBox project status,
2. My desire to rapidly iterate on CPU architectures,
3. Design of BASIC compilers,
4. BASIC's relationship to Dartmouth Time-Sharing Service,
5. Languages as shell interfaces,
I would like to thank The Blue Wizard and @mpcnat@bne.social for the
conversation which sent me down this reflective path.
On February 1, I had a bit of a melancholy conversation on Mastodon. In this
conversation, I was a bit bummed that after almost a year, I still haven't
finished my Forth interpreter for my 65816-based ForthBox computer design
concept. This places me a year behind schedule in achieving my personal
goals with this design: I was hoping to have a completely emulated environment
done by now, so that I could focus on building the hardware.
With it being that far behind schedule, though, I'm tempted to just give up on
it. Besides, the AESIR project has actual hardware running, and people are
scarfing up the Commander X-16 computers, and the SuperCPU-64
environment is now emulated on the VICE emulator, so I don't see what value I
can add to the 65816 community. It looks like my forte is stack and RISC-V
architecture CPUs, and I think people are happiest when I work on these sorts
of projects anyway.
The problem is, stack CPUs have literally zero support from anyone, anywhere.
I'd be blazing my own trail just like I did with the 16-bit, stack CPU powered
Kestrel-2. How ironic, then, that it is the Kestrel-2 which is, literally,
the only homebrew computing project I've ever brought to completion.
(Well, Kestrel-2DX too, but that's just a Kestrel-2 with my homebrew RISC-V
processor bolted on instead. A 64-bit RISC-V running on a 48KB RAM machine was
effective, even if limited in what could be done with it.)
Getting a Forth environment running is not an easy task, despite what you might
read elsewhere on the Internet. What can be said is that it is an
easier task than, say, getting something like Lisp or Python running.
But, for me at least, it remains a daunting challenge nonetheless.
In my lamentations, though, I reported that maybe I should have stuck with the
idea of writing a BASIC environment instead of going with Forth. While Forth
is demonstrably more powerful given the fewer resources it consumes, BASIC is
easier to get working if for no other reason than it's easier to test and
implement missing pieces incrementally. For example, you can start out with a
simple command-line interpreter, then add line editing support, then add
support for numeric expressions, etc. Each revision is easily encapsulated
into a self-contained and "working" (insofar as its features are completed)
program that you can hand to people and ask them for testing feedback. Forth
doesn't achieve that point until you complete its colon compiler. Up
until that point, a Forth intepreter is utterly useless, unable to function
even as a simple calculator until just before the colon compiler is
implemented. For me, it is a bear to work on, because I am constantly lost in
a sea of loose ends that all need to be resolved before I have a working tool.
Please note: by "get working," I am literally talking about from scratch,
starting with bare metal. I'm not talking about porting a pre-existing
interpreter that is already written in C or some such. The reason this
qualification is so important to me is because of one of my hobbies: I love to
homebrew my own CPU designs (here's one of my stack CPU designs, and here's my
minimal RISC-V processor). From 16-bit to 64-bit, from MISC to RISC, and all
points in between, I am curious about it all. What I explicitly do not
care to do is port a C development toolchain over to even one of these
architectures, much less to each one. Besides the compiler, that
implies needing to port linkers, assemblers, the standard libraries, and on and
on and on. Porting GCC from MIPS to RISC-V, arguably two of the most closely
related RISC architectures this planet has ever seen, still took several
years of effort before things finally stabilized, and that was with a
team of interested and highly motivated individuals the world over. I don't
have that kind of time nor the resources, especially if I want to rapidly
iterate on CPU architecture design.
## Enter Shoehorn
When I first started out on the ForthBox project, I knew that I didn't want to
sit on the 65816 processor forever. Eventually, I wanted to upgrade to a
64-bit RISC-V processor, and I knew that I also wanted to go back to exploring
stack CPUs eventually. I knew what that implied — rewriting the software
stack en toto all over again. No thanks; there had to be a better way!
This lead me to design Shoehorn. Shoehorn is a statically compiled
subset of Forth, designed to allow one to bring up new software on a new
platform with relative ease. It is explicitly not designed to produce
the tightest, fastest possible code. In fact, the code it produces will be
horribly slow on most architectures. Rather, its only purpose is to facilitate
the bootstrapping of a software stack onto a new platform starting from
bare-metal only. Make it work, make it right, make it fast, in that order.
In that sense, Shoehorn has proven to be resoundingly effective. I've been
using it to implement a Forth interpreter for my ForthBox computer design.
I've gotten the interpreter to the point where it'll show an OK prompt, and I
can invoke words in a dictionary, complete with hashing for faster dictionary
lookups. However, when I last left the project, it still had some bugs with
numeric input. In particular, it did not handle double-precision
integers correctly (e.g., 12345.6789, which pushes a 32-bit value onto
the 16-bit stack, with 6789 in the least significant word and 12345 in the most
significant word), leading to stack imbalances and such. I never did track
down the cause; I really should try again soon. Like I said, it's been almost
a year now!
The benefit to something like Shoehorn is that I should be able to port
software to a completely new processor architecture in a matter of days,
not months or years. In theory, I should be able to port my Forth environment
as it currently stands to a Z80 processor under CP/M in a matter of a few days.
Alternatively, I should be able to do the same to a completely home-brew 32-bit
MISC processor in more or less the same amount of time. I've never tested
this, but I don't see any impediments to doing so.
Now, if I could only finish the Forth environment itself, I'd be much happier!
I'd have an operating system, a systems-capable programming language, and an
interactive shell environment out of the box for just about anything. But,
working on the environment is just such a slog!
It is that frustration which brought me to the thought that, maybe if I'd used
BASIC instead of Forth for my ForthBox computer, I would be in a better
position by now because it would better facilitate development using
test-driven techniques. Too bad there is no such thing as ADHD-driven
development.
## What is BASIC, really?
If you're reading this article, there's a good chance that you'll already be
familiar with the BASIC programming language. You might already know that
BASIC stands for Beginner's All-purpose Symbolic Instruction Code, and was
intended to teach students enrolled at Dartmouth University, especially those
not in the STEM fields, how to use computers to accomplish various
academic tasks. No joke!
Syntactically, it is inspired by Fortran. But, beyond the obvious, one might
even say superficial features, what is BASIC, really?
We all know that Forth is both a language and an operating system all wrapped
in one. It turns out that BASIC are these things as well.
In fact, Forth and BASIC share a number of traits you wouldn't expect.
| BASIC | Forth |
|:---------------------------------------------|:--------------------------------------------------|
| Is a command-line interface to DTSS. | Is a command-line interface to a virtual machine. |
| Is a programming language. | Is a programming language. |
| PEEK/POKE allows reading/writing to memory | @/! allows reading/writing to memory |
| USR/SYS/CALL allows invoking asm code | EXECUTE and CODE allows invoking asm code |
| Supports multitasking via PARACT, et. al. | Supports multitasking via PAUSE, et. al. |
| (Most) Compile their code for faster runs | ":" compiles code for faster runs |
NOTE: I spent some time studying the sources to DTSS and came to the
realization that the READY prompt and the "command mode" commands (like LIST,
RUN, OLD or LOAD, SAVE, etc.) are actually DTSS shell commands, not intrinsic to the
BASIC language per se.
Yes, you read that right: if you've ever used a BASIC interpreter on almost any
8-bit home computer of the 80s, you basically have most of the knowledge you
need to use a real mainframe operating system. Put another way, most 8-bit
BASICs of the 80s were legit (albeit single-tasking) operating systems in their
own right.
Except for BASIC interpreters found in early 8/16-bit home computers, nearly
all BASIC interpreters actually compile programs before running them,
making them closer to Smalltalk or Python than to something more textual like
Tcl. The reason 8-bit and 16-bit home computer BASICs never did this, opting
instead to just tokenize input but still interpreting programs in a textual
manner, is because of limited memory resources.
Most 8-bit BASICs had to fit in 8KB of ROM and 4KB of RAM space, give or take.
There's just no place to put a compiled version of the program. The
reason 16-bit BASICs didn't do this is inertia — nearly all
implementations were literally just ported 8-bit BASICs, maybe with a few extra
commands to support graphics or sound mixed in for good measure. It wasn't
until Visual Basic 4 that we started to see Microsoft finally realize, "Huh, we
have the RAM; we should use it to make our code faster!" So, a good quality
BASIC implementation will run about the same speed as a comparable Forth
implementation, give or take, assuming the same kind of compiler is used. I
know it seems like heresy to say this, but it is true.
If you read the ANSI specification for Full BASIC, it even supports real-time
threads and processes (complete with limited memory isolation facilities
vis-a-vis the use of EXTERNAL FUNCTION and EXTERNAL SUB) and multi-threading
via PARACT, synchronous message passing with SEND and RECEIVE, and signal
variables.
But, for as much as they are similar, BASIC and Forth also differ in some key
ways. BASIC's syntax is traditionally fairly rigid, while Forth's is quite
fluid. BASIC is extremely line-oriented, while Forth is word-oriented.
BASIC is more strongly typed than Forth, supporting integers, floating point,
string values, and arrays thereof, as distinct and checked entities; meanwhile,
Forth has absolutely no types to speak of, choosing to expose the machine-level
concepts of memory addresses and the values stored in them to the programmer.
BASIC manages its memory completely dynamically and with garbage collection,
while Forth (like C) puts the responsibility for managing memory exclusively on
the programmer. To afford the user conveniences, BASIC requires more runtime
resources to run, whereas Forth is miserly, able to compete with a Full BASIC's
feature set with between 16KB to 128KB, depending on processor word size and
which features are brought in. Which brings me to one last difference:
features in BASIC tend to be static (though they don't always have to be, as
Texas Instruments' TI-99 BASIC and Extended BASIC show), while in Forth you can
select which features your program wants to use on a program-by-program basis.
Footnote: I've talked about how full-featured Forth interpreters in 8KB
are a lie before. I still stand by this statement. People often talk about
how a Forth implementation can include an editor, an assembler, a reasonably
complete set of core words, disk I/O, and console I/O within just 8KB of
memory. My studies show this not to be true when you consider such an
implementation's dependencies. They depend on a pure text-mode display, they
depend on a BIOS-like subsystem which drives that display, the disk(s), and the
keyboard (itself frequently at least 4KB to 8KB of ROM on its own, depending on
platform). When you add everything up, you get 8KB for the Forth environment
plus another (say) 8KB for the BIOS, for a total of 16KB. This is a more
realistic measure of how big a Forth system truly will turn out to be if you're
building one from scratch. Also, this further assumes an 8-bit CPU that you're
building for (specifically, a Z80). A 16-bit CPU will generally have 33%
larger binaries than a comparable 8-bit program, just as a 32-bit CPU will have
some percentage bigger binary size again, etc. In fact, for each doubling of
data path size, it's probably a good estimate to just add 33% to your program
size. So, a 32-bit Forth interpreter with the same overall capabilities as an
8-bit Forth interpreter would probably come out to be closer to 29KB in size.
Add another 33% again if it's for a RISC processor. (These are estimates, but
they are close to my observations.)
Let's look at that line-orientation I just mentioned. Any cursory examination
of how BASIC's syntax works should illustrate that BASIC is very much keyword
driven: the first keyword of a line dictates the syntax and semantics of the
remainder of the line. In fact, I'd wager the interpreter loop for any BASIC
would look something like this:
DO WHILE NOT outstanding error
Find next line of source text to interpret ELSE EXIT DO.
Extract first keyword.
Find/compile handler for keyword.
IF handler found THEN
CALL handler.
ELSE
Raise syntax error.
END IF
IF NOT outstanding error AND NOT at end of line THEN
Raise syntax error.
END IF
END DO
If the BASIC interpreter tokenizes text, extracting the first keyword might be
as simple as reading the first byte of a line, which should be a token value.
Converting that to a handler would be a simple table lookup. You get the idea.
The point being, however, that this is nearly the same algorithm used (in some
flavor) by command-line interpreters found in operating system shell interfaces
(meta-features like I/O redirection handling notwithstanding, of course).
This explains why each BASIC command seems to have its own unique syntax (e.g.,
why PRINT is so different from INPUT, for example). It's the same reason why
each Unix command has its own syntax at the shell: each command handler
performs its own parameter parsing. BASIC's features were built piece by
piece, incrementally, over many years, and without regard to an over-arching
syntax specification. This provides both its charm, and for many, its primary
source of frustration.
## BASIC as a Shell?
Which leads me to question, if we can tolerate arbitrary syntax in command-line
shells for process control, but not in a "proper" programming languages per
se, perhaps we've been using BASIC wrong all these years. Out of
necessity, most BASIC dialects end up supporting the basic DOS-like features we
come to expect anyway: creating new files via SAVE, reading them in with OLD or
LOAD, removing files with SCRATCH, and for later interpreters, similar CRUD-y
things with directories as well. If these higher-level commands aren't
sufficient for your needs, you can always rely on lower-level I/O, like OPEN,
CLOSE, file pointer manipulation statements, etc. Starting with BASIC 4,
Commodore even provided statements for copying and renaming files too; I reckon
other vendors did similar things.
It's almost all there, really. It would seem to me that BASIC is a natural
choice for extending the language to support more sophisticated process
control. The only thing that doesn't seem present in BASIC's syntax is calling
arbitrary programs like you can in, say, a Unix shell interface. The closest
official way to do this is via the CHAIN command, but I believe
this command assumes the chained program is itself written in BASIC. BBC BASIC
supports a similar operation via the asterisk prefix, so-called "star
commands", which just sends a raw command line to the underlying OS. This is
similar to Rexx's ADDRESS SYSTEM and using strings to invoke arbitrary
programs.
And before you try to dismiss my thinking because some shrivelled up shell of a
man once wrote in an inflammatory memo that, "It is practically impossible to
teach good programming to students that have had a prior exposure to BASIC: as
potential programmers they are mentally mutilated beyond hope of regeneration,"
remember that the BASIC of his day isn't the same as the BASIC of today.
Remember, too, that I can counter his couple-page, inflammatory memo with a
360 page book about Unix's shortcomings.
Maybe, just maybe, we should be using BASIC, rather than Bash, as a command
line interface rather than as a programming language in which complex
applications are written in. I can think of several benefits for doing so:
1. By evolving BASIC into a shell, you are forced to confront the problem of
how to pass variables to commands, and how to receive results back. BASIC
already defines semantics for passing arguments to/results from subroutines
and sub-programs, so it'd make maximum sense to be compatible with this
mechanism.
2. As a by-product of the above, we get a standard calling convention for
interfacing programs written in different languges at the binary level.
Today, we use C's ABI for this purpose; but, BASIC's ABI is richer. BASIC
is memory-safe, so descriptors for things like buffers, slices, and
such would be standardized. While this would seem like overkill for C/C++
programmers today, it's natural for Rust and Go developers (and
programmers in other memory-safe languages as well).
3. A compliant interpreter for ANSI BASIC already has features that support
message passing and so forth, so BASIC becomes a natural language in which
to write IPC tooling. If you've ever written a Rexx script before,
especially an ARexx script on AmigaOS, you'll know what I'm talking about
here. But, even here, it seems richer than Rexx, since BASIC uses typed
channels like Go and Aleph does, not the stringly-typed
approach that Rexx uses. Using these features will be a lot easier than
whatever COM-binding Microsoft's Visual Basic uses, since you don't need
thunks, proxies, IDL compilers, etc.
I could be talking out of my ass here, but I think BASIC has more than a few
useful tricks up its sleeve that are relevant even today, and I think it'd be a
mistake to ignore them. I should probably spend the time trying to write my
own BASIC-inspired environment and see how things go.