# A Component Object Model ABI for the 65816

I don't have much time,
so I'm not able to go into the nitty-gritty details of how everything here works.  
I may come back to this blog post and refine the article as more time permits.

The problem with the 65816 is that you cannot call a subroutine whose address has been computed.
I mean, you can, but it is horrifically inefficient.
The most obvious way to invoke an indirect long-address vector is:

        _AL             ; use 16-bit accumulator
        _XS             ; use 8-bit index registers
        LDX vecptr+2    ; push bank address byte
        PHX
        LDA vecptr      ; push bank offset (assume it's pre-decremented)
        PHA
        RTL

This totals 27 cycles, and it destroys the contents of all registers in the process.
This is highly undesirable,
and this assumes vecptr resides in direct page.
Add two more cycles if it's accessible via absolute addressing in the current bank,
and four more cycles if it resides in another bank all-together.

Through clever pointer representation, though,
we can at least fix the destroyed registers problem.
We can't do too much about the time cost of calling a method, unfortunately.

Let each application define its own concept of a pointer in direct page called
THIS.  Its purpose is to point to the current object whose method you are
invoking.  THIS resides in direct page and is a long pointer.

Referencing instance data for the object/component happens easily enough: load
Y with the offset, and use [THIS],Y addressing.

To invoke a method on the component, load X with the method ID (0, 2, ..., 2N-2
for an interface with N methods), then JSL to THIS-1.  Let's call this address
CALLTHIS.  At CALLTHIS, there **must** be an opcode byte for JML.

Thus, the instance data for the component itself must bounce the call to the
appropriate interface implementation.

So, we have a data structure layout like so:


     Direct Page

CALLTHIS | (JML)  |
         +--------+                 Component
    THIS | THIS L | .
         +--------+  |             +--------+
         | THIS M |  |-----------> | (JML)  | +0
         +--------+  |             +--------+                  Class Implementation
         | THIS H | "              | IMPL L | +1  .
                                   +--------+      |             +-------------+
                                   | IMPL M | +2   |-----------> | JMP (*+3,X) |
                                   +--------+      |             +-------------+
                                   | IMPL H | +3  "              | my_queryIf  |---->
                                   +--------+                    +-------------+
                                   |  ....  | +4                 | my_addRef   |---->
                                                                 +-------------+
                                                                 | my_release  |---->


NOTE: Both the class implementation and the application need to agree on which
direct page address corresponds to THIS.  Otherwise, the class implementation
cannot find the instance data for the interface invoked!

This method is slow, as the table illustrates.  However, compared to every
other approach for polymorphism on the 65816 platform, this is one of the
fastest methods of dispatch that is compatible with the entire 16MB address
space and set of native-mode capabilities.

| Instruction  | Cycles |
|:-------------|:------:|
| LDX #$nnnn   |  3     |
| JSL CALLTHIS |  8     |
| JML instance |  4     |
| JML vtable   |  4     |
| JMP (x,X)    |  6     |
| RTL          |  6     |
|--------------|--------|
| TOTAL        | 31     |

For functional programming, things are actually a *little* bit faster.  This is
because the component indirection can be skipped entirely.  Since "rich
pointers" consist of both a record pointer _and_ an interface pointer (which
the compiler always keeps in sync), we skip the component all-together.

         | (JML)  |
         +--------+
         | VTAB L | .
         +--------+  |                            +-------------+
         | VTAB M |  |--------------------------> | JMP (*+3,X) |
         +--------+  |                            +-------------+
         | VTAB H | "                             | func_1      |
         +--------+                               +-------------+
         | DATA L | .                             | func_2      |
         +--------+  |       +------------+       +-------------+
         | DATA M |  |-----> | ...data... |       | func_3      |
         +--------+  |       |            |       +-------------+
         | DATA H | "        +------------+       | ...         |
         +--------+
         | $00    |

With this mechanism, you need only store the v-table pointer in an absolute
location for easy JSL-ing.  The data pointer can remain anywhere else
convenient, as long as the function called and the caller agree on where to
find the instance data pointer.  In the example above, I co-locate the
pointers in direct page, but this is not strictly necessary.

OK, I gotta go.  Run with this if y'all want; I only ask that you kindly give me credit if you decide to use it.