# On Semantic Dictionary Encoding (2023-Jan-11)

This article isn't particularly well formed or thought out.
Rather, it's a recording of some thoughts I've had after
I spent a few days reading up on
Dr. Michael Franz'
thesis documenting what he calls
Semantic Dictionary Encoding (SDE).
This is the encoding used in the Oberon "Juice" or "Slim Binary" technology.
Unfortunately, SDE is not seen use since.

I assume the reader has already read this thesis
and is familiar with the terms used.

After reading through the thesis,
I think I can see why it hasn't achieved any notoriety to speak of.
While it is true that it allows for the creation of *very* compact executables,
indeed, far more compact than even a CISC instruction encoding,
the loader becomes significantly more complicated.
More precisely, the loader is a compiler's code-generating back-end.

Considering a single language use-case,
SDE decoders are complex beasts.
As one decodes a program as it's being loaded,
it is expected to update its semantic dictionary.
This is compared to "some types of compression"
(no doubt, referring to LZW compression without actually saying it by name).
However, in actual implementation,
even Franz admits that knowing *when* to update the dictionary is left as an exercise to the implementor.
This creates a huge opportunity for an encoder and decoder to become desynchronized with each other,
which obviously would create incorrect images after loading.

And, SDE decoders are slow.
It is hard for me to reconcile how Franz can, in a single sentence,
document SDE decoders being upwards of 20% to 100% slower than raw binary loaders
and yet claim that loading performance is competitive.
Franz was counting on processor-I/O speed gap widening with time as a way to amortize this cost.

Another point of contention I have with SDE is when considering its role in multi-language software builds.
Dr. Franz hints at the use of SDE as a means of achieving language independence,
but doesn't touch on it any further.
If we examine its usefulness in a multi-language environment from first principles, though,
it *ought* to be possible with some effort.

In other words, through the use of SDE,
one may write software in Oberon which depends upon software written in
C, BASIC, or any other language
as long as they also compile to an SDE representation.
This relationship goes the opposite direction as well;
it's entirely possible that a BASIC program could depend upon some BCPL module,
which in turn could depend on Modula-2, etc.

However, for this to happen,
the loader needs to be aware of what language the module was written in.
Or, at the very least, write out the initial semantic dictionary image in the module file,
in which case the loader must at least recognize the semantic dictionary *classes* that language depends upon.
It's *possible* to do,
but it puts the burden of cross-language interop
on the loader/compiler back-end.

Most languages will share a large subset of core primitives.
However, this won't always be the case.
If version 1 of a loader only supports C and BASIC, for example,
then it cannot be used to load in a Rust module.
At the very least, C and BASIC use 2's compliment addition modulo some native word size.
Rust code, however, can be configured to *trap on overflow* (in fact, enabled by default on debug builds!),
which means it needs an overflow-aware addition operation.
And, it's not just addition;
Rust has a whole litany of core operations which support overflow detection.

    | Class | Info   | Links/Missing |
    |:------|:-------|:--------------|
    | Add   | . + .  | Left, Right   |  ( for Oberon, C, BASIC, etc. )
    | AddOv | . + .  | Left, Right   |  ( for Rust )

Then there's languages like BCPL,
which implement pointers *very* differently than Oberon, C, C++, BASIC, and even Rust itself.

    | Class        | Info   | Links/Missing |
    |:-------------|:-------|:--------------|
    | Deref        | *(.)   | Left          |  ( for Oberon, C, BASIC, etc. )
    | DerefWrdAddr | . ! .  | Left, Right   |  ( for BCPL )
    | DerefBytAddr | . % .  | Left, Right   |  ( for BCPL )

Even variable assignments aren't immune from these kinds of differences.
For example,
suppose I want to create a loader that supports
both Oberon and Forth.
Classic Forth has no concept of L-values or R-values.
Just like in Bliss,
Forth code tends to grab the address of variables and explicitly store values into these locations.
For these to work, we'll need at least the following
semantic dictionary entry types:

    | Class | Info   | Links/Missing |
    |:------|:-------|:--------------|
    | Asgn  | . := . | Left, Right   |  ( for Oberon )
    | Fetch | . @    | Left          |  ( for Forth, Bliss )
    | Store | . . !  | Left, Right   |  ( for Forth, Bliss )

The loader will need to know and understand intimately
the difference in semantics between an Oberon-style assignment
and Forth-style raw memory accessors.

It doesn't stop there.
If type checking is to be enforced by the loader
(probably a worthy goal if multi-language support is desirable!),
then we'll need a loader which can understand the different symbol tables and their respective type descriptors.

So, while it might be a bit weird th find
Add, AddOv, Deref, DerefWrdAddr, DerefBytAddr, Asgn, Fetch, *and* Store classes in a semantic dictionary for a single load module,
the loader must still understand *all* of these classes
if it wants any hope to support languages as diverse as C to Rust, Ada to Zig, and Forth to Fortran.
And, all of these SDE encoders must agree with all the SDE decoders on when the semantic dictionary is to be updated with additional templates.

I think it's easy to see that
although SDE is a binary representation of a program,
SDE enforces a kind of *language-level* standard for interoperability,
instead of a *binary-level* standard.
The former kinds of standards have lots of neat advantages, but always will have a gaping hole:
there's no escape to the raw machine level.

A binary standard side-steps *all* of these issues,
putting the complexity of interop and code generation in with the compilers themselves.
However, there is one thing a binary standard cannot guarantee:
type conformance.
It's entirely possible to write a COM object that
makes demons fly out of your nose
in response to a drag-and-drop event in a Word document, for example.

And, yet, despite the long list of disadvantages surrounding SDE,
I still find myself intrigued by them.
In a weird sort of way, SDE decoders seems to be closely related to my
Shoehorn project.
In this latter project,
I create statically-compiled Forth subset which is intended
to bootstrap new systems easily *and quickly*.
If an SDE decoder can be brought up as quickly,
then I'd argue that the complexity of the SDE is amortized by its ease of maintenance.
I think bringing up a user environment
built around an SDE decoder/loader
is a worthy experiment to try some day.