In order to run useful programs, the VM/OS emulator needs to offer some means of accessing a file system. However, unchecked access to file storage is one of the most frequently abused methods to compromise a user’s account at best, and an entire host computer at worst. Similar to Uxn, VM/OS emulation services elects to sandbox access to files; that is to say, a VM/OS application only has access to the directory or directories it’s been explicitly granted access to. Unlike Uxn, however, multiple sandboxes can be established, each granting a view onto the host’s local filesystem.

To facilitate the bootstrapping of my longer-term VM/OS vision, it would be worthwhile to implement as many development tools for VM/OS using VM/OS itself. This allows us to eat our dogfood more regularly, and especially during a time when early design choices are at their most influential. And, to do that, I need the ability to access persistent storage very early in the development of VM/OS – long before we would otherwise have a functional filesystem of its own.

Naming Conventions for Files

For the sake of implementation simplicity, I choose to implement a multi-rooted filesystem implementation. Instead of having a single root directory under which all other filesystems appear, I instead implement multiple root directories. Each root has its own (unique) name, and is (logically, if not in fact) managed by its own handler.

Unlike systems with crippled multi-rooted filesystems, such as MS-DOS or Windows, the naming conventions of a VM/OS path is intended to look like we have a single root. This allows for a future version of VM/OS to migrate to a single-rooted filesystem when it’s convenient. The separation between device or filesystem and the rest of the path is uniform.

The syntax of a filename in Windows or DOS requires a drive component to be isolated from its file path via a colon (:), while path elements relative to the root of a given drive are separated by back-slashes (\). Worse, each drive unit has its own idea of what the application’s current directory is. Thus, giving the name of a file like C:FOO.TXT assumes you know what the current directory for C: is. It is not guaranteed to be the same thing as C:\FOO.TXT!

From the perspective of the software running under VM/OS, an application locates files by fusing a consistent system of naming files and other useful resources. Such names are always absolute; meaning, VM/OS does not understand the concept of a relative vs. absolute path name.

Because all filenames at the application/emulator boundary are absolute, there is no need for a prefix notation indicating an “absolute” path name. More importantly, names always include enough information to allow the emulator, if not the host OS it runs on, to locate on which storage device the file or resource may actually be found on. Thus, a filename generally takes the following form:

location / rest-of-pathname

Hence, the first element of a filename is generally taken to be a device or filesystem name. The remaining text in a filename is then passed to the unique handler for the indicated device verbatim, without further interpretation by either the application or the VM/OS environment.

Sandboxes

One kind of (pseudo-)device is the VM/OS sandbox. Until native DASD devices are supported, all files appear inside sandboxes and must be accessed through emulation services. A sandbox is just a privilege grant from the operator to the VM/OS application to be able to access host-local files at a well-known location in the host’s filesystem hierarchy. This is usually done through configuration settings of the VM/OS emulator or are passed explicitly on the command line used to launch the application.

For example, let’s say a programmer is working on a development project and the local filesystem has the following overall shape to it:

We can launch a hypothetical VM/OS text editor to edit the main.asm source listing in a way which only grants the editor access to the src directory.

$ vmos editor.bin sandbox "src=~/project_1/src" -- src/main.asm

Similarly, if a steering council wants to work on the project’s architecture documentation records (ADRs), they can launch their editor granting exclusive access to just the ADR directory:

$ vmos editor.bin sandbox "adr=$HOME/project_1/docs/adr" -- adr/adr-0004.md

Someone looking to run their application across multiple directories has several options. One, you can be precise in your permission grants by including multiple sandboxes:

$ vmos editor.bin sandbox "adr=project_1/docs/adr" "src=project_1/src" -- etc.

Or, if you are more trusting, you can just provide a single grant that covers everything:

$ vmos editor.bin sandbox "p1=/home/vertigo/project_1" -- p1/src/main.asm

You can also use sandboxes which overlap each other to create convenient shorthand paths. These function in a way similar to symbolic links; although for VMS and AmigaOS users, these are better compared to assignments. For example, we can grant total access to Project_1, but having to type p1/doc/adr/blah.md all the time can be repetitive and error-prone. We can create a sandbox adr to make things more convenient.

$ vmos editor.bin sandbox "p1=project_1" "adr=project_1/docs/adr" -- etc.

Now, you can access the files in the whole project, but if you’re focusing on ADR documentation, you can directly reference them with, e.g., adr/adr-0004.md.

Sandbox Security

Remember that file name parsing is ultimately the responsibility of individual handlers! The rules described below are intended to document how VM/OS sandboxes work specifically, and are not intended to be more broadly applicable. For example, if a future version of VM/OS creates a 9P bridge to a remote filesystem, these rules will no longer apply, as now you’re going to be playing by 9P’s rules instead.

The following rules must be enforced to ensure that nothing can escape the sandbox.

  1. Leading slashes are always skipped. This ensures that /adr/adr-0004.md, ///adr/adr-0004.md, and adr/adr-0004.md all refer to the same file.
  2. Embedded multiple slashes are treated as a single slash. This ensures that src///main.asm refers to the same file as src/main.asm.
  3. No component of the filename may refer to the parent directory in most host filesystems. E.g., both Unix and Windows use .. for this purpose, so that /abc/../def/ghi and /def/ghi are the same filename. However, once the file handler is resolved, it will only see an attempt to access a file named ../def/ghi. It’s up to the handler to resolve the remainder of the filename. A path component that steps back a directory must not result in breaking out of the sandbox under any circumstances. One approach is to just forbid .. from appearing at all in a filename; another is to try to canonize the filename before translating it to a host pathname.

Be extremely wary of symbolic links! Symbolic links should refer only to directories and files intended to be used by a VM/OS application. Beware of social engineering attempts to get you to create a symbolic link, say, src, which points to data irrelevant to the VM/OS environment (e.g., /etc).

VM/OS Emulator CLI Syntax

We’ve already seen how the new emulator CLI syntax looks in the previous section, Sandboxes. However, the formal syntax for options is repeated below for completeness.

vmos-cmd := vmos options

options := vmos-options | vmos-options -- app-options

vmos-options := app-name sb-options | FROM app-name sb-options

sb-options := SANDBOX sb-list

sb-list := sb | sb sb-list

sb := sb-name = local-path

The use of keyword arguments/options is unconventional for Unix or Windows platforms, but is conventional for Tripos, AmigaDOS, and by extension, my long-term VM/OS vision. Keywords are case insensitive unless specified otherwise.

Supporting System Calls (PRELIMINARY)

To support file access under emulator control, a number of system calls are added to the emulator services API. As with other contemporary filesystem APIs, a file is treated as an unordered bag of bytes with no system-imposed interpretation. All file access is assumed to be in binary mode.

The calling sequence for these API entry points is illustrated in the following code snip:

ld    a0,arg0(sp)        ;load argument from memory
addi  a1,s0,arg1         ;compute argument from another register
addi  a2,t3,0            ;transfer argument as-is from another register

addi  a7,x0,ecFunction   ;Set ECALL service number
ecall                    ;Invoke emulator service
bne   a0,x0,error_case   ;If A0 != 0, an error happened.
;Use value in A1 (if any) in subsequent computation.

To more conveniently notate these calling conventions, I use a pseudo-C-like syntax with register annotations, like so:

ecFunction(arg0, arg1, arg2) --> error, result
            a0    a1    a2 ...    a0      a1

Some functions require passing a string, vector, or a buffer instead of a scalar value. Strings and buffers both require a base address indicating the first element and a length. These are notated like so:

ecFunction(filename, buffer, flags) --> error, result
             a0/a1   a2/a3    a4 ...    a0      a1

The convention here is that the lower-numbered register (e.g., a0 or a2) contains the base address to the buffer, while the higher numbered register (i.e., a1 or a3) contains the length of data therein. For buffers, the length is measured in bytes. For vectors, the length is measured in elements (unless otherwise documented). Strings are considered vectors of bytes; thus, it just happens that for strings, the length passed is the byte-length of the string data.

ecOpen

This opens a file identified by name. If an error occurs, return an error in register A1 (A0 is undefined). Errors can happen for the following reasons:

ecOpen(filename, ioflags) --> error, handle
        a0/a1      a2          a0      a1

ecClose

This closes a previously opened file.

ecClose(handle) --> error
          a0         a0

ecRead

This reads a number of bytes from the file into the supplied buffer. At most size bytes will be transferred; however, a smaller number may be transferred if, e.g., VM/OS encounters the end of the file before filling up the buffer.

If zero is returned, then the file is at the end; there’s just no further data to read. This is not an error condition.

ecRead(handle, buffer) --> error, actual
         a0    a1/a2         a0     a1

ecWrite

This writes a number of bytes from the supplied buffer to the file. At most size bytes will be transferred; however, a smaller number may be transferred if, e.g., VM/OS encounters the a maximum size quota before filling up the buffer.

If zero is returned, then the file is at capacity; no further data can be written because there’s just no place to put it. This is not an error condition.

ecWrite(handle, buffer) --> error, actual
          a0    a1/a2         a0     a1

ecSeek

This relocates the current read/write position of the file. The file position can be located anywhere within the file’s current extents.

You can determine the size of the file by seeking to position 0 relative to the end of the file, then seeking back to its original location.

NOTE: In most OSes, you must first flush the file before seeking to avoid filesystem corruption. This seems like a silly oversight to make; if you’re aware enough to document this condition which is obviously a bug (in the sense of maintaining the principle of least surprise), then you should be aware enough to fix this issue. In VM/OS, this system call always flushes before performing a seek operation. Thus, you can also use this call to flush buffers. If all you want to do is flush a file but do not want to relocate the read/write pointer, you can do so simply by seeking to position 0 relative to the current position.

ecSeek(handle, position, whence) --> error, old_pos
         a0       a1        a2        a0       a1

ecDelete

This function deletes a file by name, if it exists. The operation is idempotent; if the file does not exist, then nothing happens. Attempting to dispose of a directory is currently left unspecified.

ecDelete(filename) --> error
           a0/a1        a0

ecRename

This function renames a file to something new. The new name must reside on the same device as the old name.

ecRename(oldname, newname) --> error
          a0/a1    a2/a3        a0

Conclusion

This interface should provide the nascent VM/OS environment with the facility to process source files and produce its own build artifacts. The interface and file naming convention is believed to be future-proof with a future single-rooted filesystem implementation, while supporting well-confined sandboxes granting localized access to a host filesystem resources. I regret that I could not offer this in an asynchronous manner, thus taking advantage of the multitasking facilities VM/OS already provides; however, as intermediate build tools generally tend to be synchronous anyway, and local storage devices tend to be extremely fast these days, I didn’t think this would be a show-stopping issue.

APPENDIX: Inspiration

The inspiration for interpreting filenames the way described above comes from an early home computer which, despite its reputation for being crippled by design, actually came with a system software design which was remarkably ahead of its time. I am speaking of the Texas Instruments TI-99/4 and /4A computer.

It came as a shock to me when I learned how the TI-99 platform manages to handle support for installable filesystems. For example, in TI BASIC, you could load a program from an old-fashioned 90KB floppy disk like this:

OLD "DSK1.MY-PROGRAM"

This would work great if the disk containing MY-PROGRAM resided in unit DSK1. But, what if you don’t know which disk drive the floppy is inserted into, but you do know the name of the volume you assigned when you formatted it? That’s supported too:

OLD "DSK.MY-VOL.MY-PROGRAM"

Isn’t that neat? We would not see functionality like this in a commercially available product again until the release of the Commodore-Amiga in 1985.

1> ; Access My-Program by physical device name
1> DF0:My-program
1> ; Now by volume name, not knowing which physical device it's in
1> My-Volume:My-Program

This doesn’t stop with a flat directory structure, either. To be fair, the drivers supporting 90KB, 180KB, and 360KB floppy disks only recognize a single, flat, root directory; but surprisingly, there’s nothing in the system interface which mandates this. This is because the TI software delegates filename parsing to the specific device handler; to the system software itself, a filename is completely opaque beyond the search key used to locate the specific handler.

Today, there’s a plurality of 3rd party or even home-made devices which implement hundreds of megabytes worth of storage, complete with support for sub-directories. The IDEAL Project, for example, supports not only virtual floppies, but also sub-directories in a manner significantly more natural than how CMD implemented sub-directory support for Commodore 8-bit computers. E.g., .1.THROUGH.THE.FOREST.TO.GRNDMA/TXT.

This interface also extends itself to non-block devices as well. For example, if you wanted to use a modem to connect to a BBS, you might configure your terminal to access a device RS232/1.SPEED=9600.HS=HW.FRAME=8N1. Did you want to generate a 16-character random password easily? If you have the appropriate software loaded into RAM, you could use TI BASIC like this to generate it (forgive any syntax errors; I haven’t used a TI since 1981!):

100 OPEN #1:"RANDOM.ASCII",INTERNAL,INPUT,FIXED 16
110 INPUT #1:P$
120 CLOSE #1
130 PRINT "PASSWORD IS: "&P$

Remember the previously mentioned IDEAL project? There are even virtual files for implementing Blowfish encryption of arbitrary data blocks! Write your data to one virtual file and read back the encrypted result on another.

So, how does all this work in practice?

TI-99 Design Summary

There are many web pages which describes all these mechanics better than I can here; however, I’ll briefly summarize so that the reader doesn’t need to flip back and forth between webpages all the time.

NOTE: A word on notation before we continue; sometimes, I’ll refer to hexadecimal values using a dollar prefix, which is more or less industry standard notation for anyone not reared on Texas Instruments’ product line. For some reason, TI chose to use the greater-than symbol as its hex prefix. Therefore, >4000 and $4000 both mean exactly the same thing: 0x4000 for those of you who understand C notation.

The system software divvies up the processor’s address space into 8KB chunks: >0000->1FFF, >2000->3FFF, and so forth. Some of these regions hold special significance in that they’re hard-wired for certain types of peripherals (e.g., HexBus devices, cartridge ROMs, etc.). The figure below is from https://www.unige.ch/medecine/nouspikel/ti99/architec.htm .

>0000 ------------------+  
      | Console ROM     |  
      +                 +  
      |                 |            +------------------+ >8000  
>2000 +-----------------+            | (mirror of RAM)  |  
      | Low memory      |           /|                  |  
      +                 +          / |>8300-83FF: RAM   |  
      | expansion       |         /  +------------------+ >8400  
>4000 +-----------------+        /   |>8400: sound chip |  
      | Peripheral      |       /    |       write      |  
      + cards ROM       +      /     |                  |  
      |                 |     /      +------------------+ >8800  
>6000 +-----------------+    /       |>8800: VDP read   |  
      | Cartridge       |   /        |>8802: VDP status |  
      + ROM/RAM         +  /         |                  |  
      |                 | /          |>8C00: VDP write  |  
>8000 +-----------------+-           |>8C02: set address|  
      | scratch-pad RAM |            |                  |  
      + memory-mapped   +            |                  |  
      | devices         |            +------------------+ >9000  
>A000 +-----------------+-           |>9000: speech     |  
      | High memory     | \          | synthesizer read |   
      + expansion       +  \         |                  |  
      |                 |   \        |>9400: speech     |  
>C000 +                 +    \       | synthesizer write|  
      |                 |     \      +------------------+ >9800  
      +                 +      \     |>9800: GROM read  |      
      |                 |       \    |>9802: read addr  |  
>E000 +                 +        \   |                  |  
      |                 |         \  |>9C00: write data |  
      +                 +          \ |>9C02: set address|  
      |                 |           \|                  |  
>FFFF +-----------------+            |                  |  
                                     +------------------+ >9FFF  

When software wants to open a file, it needs to find a handler. The TI convention is to perform a “scan” of memory, starting at address >0000, then moving to >2000, then moving to >4000, and so forth, looking for a suitable handler for the filename provided. At each 8K boundary, a data structure is checked to see if it is even worth checking that 8K block. For example, it is possible that an expansion device or cartridge is not plugged in; thus, its corresponding region of the address space is missing. Attempting to resolve memory pointers and other attributes in these regions would cause the computer to crash.

This presence check structure looks something like this:

x000    byte  >AA        ;Tag indicating a handler header
x001    byte  1          ;Version of this data structure (I think)
x002    byte  0          ;Number of "programs"
x003    byte  0          ;padding
x004    word  init_list  ;procedure list for power-on initialization
x006    word  0          ;program list (we're not interested)
x008    word  dsr_list   ;"Device Service Routine" list
x00A    word  isr_list   ;list of interrupt service routines

NOTE: Although details differ, we wouldn’t see anything like this again until the Commodore-Amiga in 1985, vis-a-vis the RomTag structure described in the Exec: Libraries chapter of the Amiga ROM Kernel Reference Manual, Libraries and Devices. See also the documentation for the expansion.library, in the same manual.

The Device Service Routine (DSR) is what we’re most interested in here, as that provides the linkage between the client software and the device driver we’re looking for. Each node on the dsr_list looks like this:

dsr_node0:
    word  dsr_node1     ;Next node in the list
    word  dsr_handler   ;Procedure to handle I/O requests
    byte  dsr_name_len  ;Length of DSR name
    byte  "...."        ;DSR name

So, for example, the handlers for the floppy disk drive might look like this:

dsk_node:
    word  dsk1_node
    word  dsk_entry
    byte  3,"DSK"

dsk1_node:
    word  dsk2_node
    word  dsk1_entry
    byte  4,"DSK1"

dsk2_node:
    word  dsk3_node
    word  dsk2_entry
    byte  4,"DSK2"

dsk3_node:
    word  dsk4_node
    word  dsk3_entry
    byte  4,"DSK3"

dsk4_node:
    word  >0000   ;End Of List
    word  dsk4_entry
    byte  4,"DSK4"

Since there are eight 8KB blocks in the processor’s 64KB address space, it follows that there are eight linked lists to check before locating a handler; and, assuming all those are populated, there can be any number of DSRs on each list. If we were to write a simple C pseudo-code to describe the search algorithm that is used, I reckon it would look a lot like the following:

typedef void HANDLER(void);
typedef struct node_block NODE;
typedef struct header_block HEADER;
typedef union scan_block SCAN;

struct node_block {
    NODE    *next;
    HANDLER *proc;
    uint8_t  name[8]; //name[0] == length of name
};

struct header_block {
    char signature;
    char version;
    char n_progs;
    char padding;
    NODE *init_list;
    NODE *prog_list;
    NODE *dsr_list;
    NODE *isr_list;
};

union scan_block {
    HEADER h;
    char   gap[8192];
};

HANDLER *
find_handler_in_block(char *prefix, SCAN *blk) {
    NODE *pn = blk->h.dsr_list, *pnext;
    HANDLER *phnd = NULL;

    while(pn != NULL) {
        pnext = pn->next;
        if(name_matches(prefix, pn->name)) {
            phnd = pn->proc;
            break;
        }

        pn = pnext;
    }

    return phnd;
}

HANDLER *
find_handler(char *prefix) {
    SCAN *bp = (SCAN *)0x0000;
    HANDLER *phnd = NULL;

    do {
        if((bp->h.signature == 0xAA) && (bp->h.version == 1)) {
            phnd = find_handler_in_block(prefix, bp);
            if(phnd != NULL)
                break;
        }
        ++bp;
    } while(bp != 0x0000);

    return phnd;
}

Once the handler portion of the filename has been resolved to a handler procedure, that procedure is invoked with a data structure called a Parameter Access Block, which basically describes the requested I/O operation, supplies required parameters, and returns various results including error conditions. Up to 256 operations are allowed (the opcode field is a byte), but usually 9 standard operations are sufficient to interoperate with everything else in the system, BASIC included:

I don’t need to go into specifics about the PAB here; but, if you’re interested in learning more, check out this document on writing your own DSRs.