# Musings on BASIC, Forth, ForthBox, and Other Things

This blog entry is going to seem like it rambles a lot.  No, correction —
this blog article is going to ramble a lot.  But, you should be used to
this by now.  The topics touched are as follows (in no particular order):

1. My abysmal ForthBox project status,
2. My desire to rapidly iterate on CPU architectures,
3. Design of BASIC compilers,
4. BASIC's relationship to Dartmouth Time-Sharing Service,
5. Languages as shell interfaces,

I would like to thank The Blue Wizard and @mpcnat@bne.social for the
conversation which sent me down this reflective path.


On February 1, I had a bit of a melancholy conversation on Mastodon. In this conversation, I was a bit bummed that after almost a year, I still haven't finished my Forth interpreter for my 65816-based ForthBox computer design concept. This places me a year behind schedule in achieving my personal goals with this design: I was hoping to have a completely emulated environment done by now, so that I could focus on building the hardware. With it being that far behind schedule, though, I'm tempted to just give up on it. Besides, the AESIR project has actual hardware running, and people are scarfing up the Commander X-16 computers, and the SuperCPU-64 environment is now emulated on the VICE emulator, so I don't see what value I can add to the 65816 community. It looks like my forte is stack and RISC-V architecture CPUs, and I think people are happiest when I work on these sorts of projects anyway. The problem is, stack CPUs have literally zero support from anyone, anywhere. I'd be blazing my own trail just like I did with the 16-bit, stack CPU powered Kestrel-2. How ironic, then, that it is the Kestrel-2 which is, literally, the only homebrew computing project I've ever brought to completion. (Well, Kestrel-2DX too, but that's just a Kestrel-2 with my homebrew RISC-V processor bolted on instead. A 64-bit RISC-V running on a 48KB RAM machine was effective, even if limited in what could be done with it.) Getting a Forth environment running is not an easy task, despite what you might read elsewhere on the Internet. What can be said is that it is an easier task than, say, getting something like Lisp or Python running. But, for me at least, it remains a daunting challenge nonetheless. In my lamentations, though, I reported that maybe I should have stuck with the idea of writing a BASIC environment instead of going with Forth. While Forth is demonstrably more powerful given the fewer resources it consumes, BASIC is easier to get working if for no other reason than it's easier to test and implement missing pieces incrementally. For example, you can start out with a simple command-line interpreter, then add line editing support, then add support for numeric expressions, etc. Each revision is easily encapsulated into a self-contained and "working" (insofar as its features are completed) program that you can hand to people and ask them for testing feedback. Forth doesn't achieve that point until you complete its colon compiler. Up until that point, a Forth intepreter is utterly useless, unable to function even as a simple calculator until just before the colon compiler is implemented. For me, it is a bear to work on, because I am constantly lost in a sea of loose ends that all need to be resolved before I have a working tool. Please note: by "get working," I am literally talking about from scratch, starting with bare metal. I'm not talking about porting a pre-existing interpreter that is already written in C or some such. The reason this qualification is so important to me is because of one of my hobbies: I love to homebrew my own CPU designs (here's one of my stack CPU designs, and here's my minimal RISC-V processor). From 16-bit to 64-bit, from MISC to RISC, and all points in between, I am curious about it all. What I explicitly do not care to do is port a C development toolchain over to even one of these architectures, much less to each one. Besides the compiler, that implies needing to port linkers, assemblers, the standard libraries, and on and on and on. Porting GCC from MIPS to RISC-V, arguably two of the most closely related RISC architectures this planet has ever seen, still took several years of effort before things finally stabilized, and that was with a team of interested and highly motivated individuals the world over. I don't have that kind of time nor the resources, especially if I want to rapidly iterate on CPU architecture design. ## Enter Shoehorn When I first started out on the ForthBox project, I knew that I didn't want to sit on the 65816 processor forever. Eventually, I wanted to upgrade to a 64-bit RISC-V processor, and I knew that I also wanted to go back to exploring stack CPUs eventually. I knew what that implied — rewriting the software stack en toto all over again. No thanks; there had to be a better way! This lead me to design Shoehorn. Shoehorn is a statically compiled subset of Forth, designed to allow one to bring up new software on a new platform with relative ease. It is explicitly not designed to produce the tightest, fastest possible code. In fact, the code it produces will be horribly slow on most architectures. Rather, its only purpose is to facilitate the bootstrapping of a software stack onto a new platform starting from bare-metal only. Make it work, make it right, make it fast, in that order. In that sense, Shoehorn has proven to be resoundingly effective. I've been using it to implement a Forth interpreter for my ForthBox computer design. I've gotten the interpreter to the point where it'll show an OK prompt, and I can invoke words in a dictionary, complete with hashing for faster dictionary lookups. However, when I last left the project, it still had some bugs with numeric input. In particular, it did not handle double-precision integers correctly (e.g., 12345.6789, which pushes a 32-bit value onto the 16-bit stack, with 6789 in the least significant word and 12345 in the most significant word), leading to stack imbalances and such. I never did track down the cause; I really should try again soon. Like I said, it's been almost a year now! The benefit to something like Shoehorn is that I should be able to port software to a completely new processor architecture in a matter of days, not months or years. In theory, I should be able to port my Forth environment as it currently stands to a Z80 processor under CP/M in a matter of a few days. Alternatively, I should be able to do the same to a completely home-brew 32-bit MISC processor in more or less the same amount of time. I've never tested this, but I don't see any impediments to doing so. Now, if I could only finish the Forth environment itself, I'd be much happier! I'd have an operating system, a systems-capable programming language, and an interactive shell environment out of the box for just about anything. But, working on the environment is just such a slog! It is that frustration which brought me to the thought that, maybe if I'd used BASIC instead of Forth for my ForthBox computer, I would be in a better position by now because it would better facilitate development using test-driven techniques. Too bad there is no such thing as ADHD-driven development. ## What is BASIC, really? If you're reading this article, there's a good chance that you'll already be familiar with the BASIC programming language. You might already know that BASIC stands for Beginner's All-purpose Symbolic Instruction Code, and was intended to teach students enrolled at Dartmouth University, especially those not in the STEM fields, how to use computers to accomplish various academic tasks. No joke! Syntactically, it is inspired by Fortran. But, beyond the obvious, one might even say superficial features, what is BASIC, really? We all know that Forth is both a language and an operating system all wrapped in one. It turns out that BASIC are these things as well. In fact, Forth and BASIC share a number of traits you wouldn't expect. | BASIC | Forth | |:---------------------------------------------|:--------------------------------------------------| | Is a command-line interface to DTSS. | Is a command-line interface to a virtual machine. | | Is a programming language. | Is a programming language. | | PEEK/POKE allows reading/writing to memory | @/! allows reading/writing to memory | | USR/SYS/CALL allows invoking asm code | EXECUTE and CODE allows invoking asm code | | Supports multitasking via PARACT, et. al. | Supports multitasking via PAUSE, et. al. | | (Most) Compile their code for faster runs | ":" compiles code for faster runs | NOTE: I spent some time studying the sources to DTSS and came to the realization that the READY prompt and the "command mode" commands (like LIST, RUN, OLD or LOAD, SAVE, etc.) are actually DTSS shell commands, not intrinsic to the BASIC language per se. Yes, you read that right: if you've ever used a BASIC interpreter on almost any 8-bit home computer of the 80s, you basically have most of the knowledge you need to use a real mainframe operating system. Put another way, most 8-bit BASICs of the 80s were legit (albeit single-tasking) operating systems in their own right. Except for BASIC interpreters found in early 8/16-bit home computers, nearly all BASIC interpreters actually compile programs before running them, making them closer to Smalltalk or Python than to something more textual like Tcl. The reason 8-bit and 16-bit home computer BASICs never did this, opting instead to just tokenize input but still interpreting programs in a textual manner, is because of limited memory resources. Most 8-bit BASICs had to fit in 8KB of ROM and 4KB of RAM space, give or take. There's just no place to put a compiled version of the program. The reason 16-bit BASICs didn't do this is inertia — nearly all implementations were literally just ported 8-bit BASICs, maybe with a few extra commands to support graphics or sound mixed in for good measure. It wasn't until Visual Basic 4 that we started to see Microsoft finally realize, "Huh, we have the RAM; we should use it to make our code faster!" So, a good quality BASIC implementation will run about the same speed as a comparable Forth implementation, give or take, assuming the same kind of compiler is used. I know it seems like heresy to say this, but it is true. If you read the ANSI specification for Full BASIC, it even supports real-time threads and processes (complete with limited memory isolation facilities vis-a-vis the use of EXTERNAL FUNCTION and EXTERNAL SUB) and multi-threading via PARACT, synchronous message passing with SEND and RECEIVE, and signal variables. But, for as much as they are similar, BASIC and Forth also differ in some key ways. BASIC's syntax is traditionally fairly rigid, while Forth's is quite fluid. BASIC is extremely line-oriented, while Forth is word-oriented. BASIC is more strongly typed than Forth, supporting integers, floating point, string values, and arrays thereof, as distinct and checked entities; meanwhile, Forth has absolutely no types to speak of, choosing to expose the machine-level concepts of memory addresses and the values stored in them to the programmer. BASIC manages its memory completely dynamically and with garbage collection, while Forth (like C) puts the responsibility for managing memory exclusively on the programmer. To afford the user conveniences, BASIC requires more runtime resources to run, whereas Forth is miserly, able to compete with a Full BASIC's feature set with between 16KB to 128KB, depending on processor word size and which features are brought in. Which brings me to one last difference: features in BASIC tend to be static (though they don't always have to be, as Texas Instruments' TI-99 BASIC and Extended BASIC show), while in Forth you can select which features your program wants to use on a program-by-program basis. Footnote: I've talked about how full-featured Forth interpreters in 8KB are a lie before. I still stand by this statement. People often talk about how a Forth implementation can include an editor, an assembler, a reasonably complete set of core words, disk I/O, and console I/O within just 8KB of memory. My studies show this not to be true when you consider such an implementation's dependencies. They depend on a pure text-mode display, they depend on a BIOS-like subsystem which drives that display, the disk(s), and the keyboard (itself frequently at least 4KB to 8KB of ROM on its own, depending on platform). When you add everything up, you get 8KB for the Forth environment plus another (say) 8KB for the BIOS, for a total of 16KB. This is a more realistic measure of how big a Forth system truly will turn out to be if you're building one from scratch. Also, this further assumes an 8-bit CPU that you're building for (specifically, a Z80). A 16-bit CPU will generally have 33% larger binaries than a comparable 8-bit program, just as a 32-bit CPU will have some percentage bigger binary size again, etc. In fact, for each doubling of data path size, it's probably a good estimate to just add 33% to your program size. So, a 32-bit Forth interpreter with the same overall capabilities as an 8-bit Forth interpreter would probably come out to be closer to 29KB in size. Add another 33% again if it's for a RISC processor. (These are estimates, but they are close to my observations.) Let's look at that line-orientation I just mentioned. Any cursory examination of how BASIC's syntax works should illustrate that BASIC is very much keyword driven: the first keyword of a line dictates the syntax and semantics of the remainder of the line. In fact, I'd wager the interpreter loop for any BASIC would look something like this: DO WHILE NOT outstanding error Find next line of source text to interpret ELSE EXIT DO. Extract first keyword. Find/compile handler for keyword. IF handler found THEN CALL handler. ELSE Raise syntax error. END IF IF NOT outstanding error AND NOT at end of line THEN Raise syntax error. END IF END DO If the BASIC interpreter tokenizes text, extracting the first keyword might be as simple as reading the first byte of a line, which should be a token value. Converting that to a handler would be a simple table lookup. You get the idea. The point being, however, that this is nearly the same algorithm used (in some flavor) by command-line interpreters found in operating system shell interfaces (meta-features like I/O redirection handling notwithstanding, of course). This explains why each BASIC command seems to have its own unique syntax (e.g., why PRINT is so different from INPUT, for example). It's the same reason why each Unix command has its own syntax at the shell: each command handler performs its own parameter parsing. BASIC's features were built piece by piece, incrementally, over many years, and without regard to an over-arching syntax specification. This provides both its charm, and for many, its primary source of frustration. ## BASIC as a Shell? Which leads me to question, if we can tolerate arbitrary syntax in command-line shells for process control, but not in a "proper" programming languages per se, perhaps we've been using BASIC wrong all these years. Out of necessity, most BASIC dialects end up supporting the basic DOS-like features we come to expect anyway: creating new files via SAVE, reading them in with OLD or LOAD, removing files with SCRATCH, and for later interpreters, similar CRUD-y things with directories as well. If these higher-level commands aren't sufficient for your needs, you can always rely on lower-level I/O, like OPEN, CLOSE, file pointer manipulation statements, etc. Starting with BASIC 4, Commodore even provided statements for copying and renaming files too; I reckon other vendors did similar things. It's almost all there, really. It would seem to me that BASIC is a natural choice for extending the language to support more sophisticated process control. The only thing that doesn't seem present in BASIC's syntax is calling arbitrary programs like you can in, say, a Unix shell interface. The closest official way to do this is via the CHAIN command, but I believe this command assumes the chained program is itself written in BASIC. BBC BASIC supports a similar operation via the asterisk prefix, so-called "star commands", which just sends a raw command line to the underlying OS. This is similar to Rexx's ADDRESS SYSTEM and using strings to invoke arbitrary programs. And before you try to dismiss my thinking because some shrivelled up shell of a man once wrote in an inflammatory memo that, "It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration," remember that the BASIC of his day isn't the same as the BASIC of today. Remember, too, that I can counter his couple-page, inflammatory memo with a 360 page book about Unix's shortcomings. Maybe, just maybe, we should be using BASIC, rather than Bash, as a command line interface rather than as a programming language in which complex applications are written in. I can think of several benefits for doing so: 1. By evolving BASIC into a shell, you are forced to confront the problem of how to pass variables to commands, and how to receive results back. BASIC already defines semantics for passing arguments to/results from subroutines and sub-programs, so it'd make maximum sense to be compatible with this mechanism. 2. As a by-product of the above, we get a standard calling convention for interfacing programs written in different languges at the binary level. Today, we use C's ABI for this purpose; but, BASIC's ABI is richer. BASIC is memory-safe, so descriptors for things like buffers, slices, and such would be standardized. While this would seem like overkill for C/C++ programmers today, it's natural for Rust and Go developers (and programmers in other memory-safe languages as well). 3. A compliant interpreter for ANSI BASIC already has features that support message passing and so forth, so BASIC becomes a natural language in which to write IPC tooling. If you've ever written a Rexx script before, especially an ARexx script on AmigaOS, you'll know what I'm talking about here. But, even here, it seems richer than Rexx, since BASIC uses typed channels like Go and Aleph does, not the stringly-typed approach that Rexx uses. Using these features will be a lot easier than whatever COM-binding Microsoft's Visual Basic uses, since you don't need thunks, proxies, IDL compilers, etc. I could be talking out of my ass here, but I think BASIC has more than a few useful tricks up its sleeve that are relevant even today, and I think it'd be a mistake to ignore them. I should probably spend the time trying to write my own BASIC-inspired environment and see how things go.