# A Possible STEbus Interface for the 65816 Processor

The 65816's local bus is known to be very simple;
indeed, one might even say too simple.
It is generally synchronous, although I [documented](https://falvotech.com/tmp-blog/20230608.html)
how to implement a 68000-like asynchronous interface
with only a handful of logic.

The STEbus isn't pseudo-asynchronous, unlike the 68000.
It is truly asynchronous.  Although, it is perfectly
legal to implement it in a pseudo-asynchronous way.  There
is a 16MHz clock provided on each expansion slot for this
convenience; however, none of the signals on the
remaining pins are in any way referenced to this clock.

So, how does one interface a fully synchronous processor
to a fully asynchronous expansion backplane?

State machines.  And registers.  Lots of registers.

I'm going to focus on the asynchronous handshake
that happens between the bus master (65816) and slave peripherals.
**Note:** From here-on out, I'm going to prefer to use the
expressions **initiator** and **peripheral**, respectively, instead.

## Writes to Memory or I/O

The simplest possible transaction to support is a write into memory or I/O space.
Writing requires registering the address bus, the data bus, and R/W to generate the
corresponding signals on the STEbus side.

                            +----------+
                            |          |+
        A0-A19 (cpu) ======>|D  '373  Q|=====> A0-A19 (ste)
                            |          ||
               CPUWR o------|LE        ||
             _STEDRA o-----o|/OE       ||
                            |          ||
                            +----------+|
                            |          |+
                R/W  o------|D        Q|-----> CM0
                A20  o------|D  '373  Q|-----> CM1
                +Vcc o------|D        Q|-----> CM2
                            |          |
               CPUWR o------|LE        |
             _STEDRA o-----o|/OE       |
                            |          |
                            +----------+
                            |          |
         D0-D7 (cpu) ======>|D  '373  Q|=====> D0-D7 (ste)
                            |          |
               CPUWR o------|LE        |
             _STEDRD o-----o|/OE       |
                            |          |
                            +----------+

This circuit will latch the desired address and the data from the initiator.
We don't actually drive the bus until _STEDRA and _STEDRD (STE drive address
and data, respectively) are asserted.  That will be the topic of the subsequent
circuit below.

The data path is fairly straight-forward, and is directed by the following signals:

| Signal | Description                                                                                             |
|:------:|:--------------------------------------------------------------------------------------------------------|
| CPUWR  | Asserted to latch the 65816's address and data buses.                                                   |
| phi2   | The 65816's synchronous bus clock.  For the ForthBox design, it'll run between 4MHz and 8MHz.           |
| R/W    | 1 if the 65816 is reading from memory, 0 if writing.                                                    |
| STECS  | Asserted if the 65816's address bus indicates a byte somewhere in STE address space (memory or I/O).    |
| STEDRA | Asserted to drive the STEbus address and command modifier buses with the latched values from the 65816. |
| STEDRD | Asserted to drive the STEbus data bus with the latched value of the 65816's data bus.                   |

While the data path is really simple, the control logic is where things start to get intricate.
The state machine we want to implement is as follows:

        T0      Assert CPURDY to let the CPU continue.
                Wait for CPUWR high.

        T1      Wait for CPUWR low.

        T2      Drive the address bus and data bus.
                Negate CPURDY to prevent an accidental back-to-back write while the STEbus is busy.

        T3      Assert _ADRSTB and _DATSTB.
                If _DATACK and _TFRERR are negated, wait; else goto T4.

        T4      Release the address bus and data bus.
                Negate _ADRSTB and _DATSTB.
                If either _DATACK or _TFRERR are asserted, wait; else goto T0.

or, put more formally,

| T | Transition Rule                                                                 | Signals Asserted               | Description                                                             |
|:-:|:--------------------------------------------------------------------------------|:-------------------------------|:------------------------------------------------------------------------|
| 0 | If (phi2, STECS, R/W) = (1, 1, 0), goto T1; else T0.                            | CPURDY                         | Wait for a write to STE space.                                          |
| 1 | If phi2 = 1, goto T1; else T2.                                                  | CPURDY, CPUWR                  | Wait for write-cycle to complete; latch signals while we wait.          |
| 2 | Goto T3.                                                                        | STEDRA, STEDRD                 | Drive address/command bus with latched signals.  (Provide set-up time.) |
| 3 | IF xxxACK = 0, goto T3; else T4.                                                | STEDRA, STEDRD, xxxSTB         | Drive strobes; wait for DATACK or TFRERR.                               |
| 4 | IF xxxACK = 1, goto T4; else T0.                                                |                                | Clear address, command, data; wait for cycle to end.                    |

**NOTE:** In the table above, I treat all signals as 1 if it's asserted or 0 if it's negated.
In a real circuit, some signals with be active-high and some will be active-low, as dictated by the needs of various interface chips.
This is why the table references, e.g., STEDRA, while the datapath schematic references _STEDRA instead.
This might seem awkward if you're not used to this convention;
however, when working with programmable logic (especially FPGAs!), this is usually the norm.

This assumes we have a state machine synchronously clocked at a rate much faster than the host CPU.
In our case, since ADRSTB*/DATSTB* need a setup time of 35ns, 28.5MHz seems a natural fit.
This results in a timing sequence like the following, which shows a back-to-back write into STEbus space::

                        |          write          |          wait           |           write          |
                                      ____________              ____________               ____________
        phi2            \____________/            \____________/            \_____________/            \
                        _____
        R/W                  \__________________________________________________________________________
                        _______________
        _STECS                     \\\\\________________________________________________________________
                                          _______________                                   ____________
        CPUWR           _________________/               \_________________________________/
                        _________________________________                            ___________________
        CPURDY                                           \__________________________/
                        ________________ ________________ _____ _________ _________ _________ __________
        T               _______0________X_______1________X__2__X____3____X____4____X____0____X_____1____
                        __________________________________                 _____________________________
        _STEDRx                                           \_______________/
                                                           _______________
        A0-A19 (ste)    ----------------------------------<_______________>-----------------------------
                                                           _______________
        D0-D7 (ste)     ----------------------------------<_______________>-----------------------------
                        ________________________________________           _____________________________
        xxxSTB*                                                 \_________/
                        _______________________________________________           ______________________
        DATACK*                                                        \_________/

The state machine guarantees that the CPU's RDY signal is negated to prevent mutual interference.
(We assume the address decoding logic routes the CPURDY signal back to the CPU's RDY input for as long as _STECS is asserted.
If the CPU were to address another bus resource after kicking off an STEbus write, the wait state would be avoided entirely.)

While the state machine is in motion, it starts by driving the STEbus with the latched value of the address, data, and command buses.
35ns later (thanks to the 28.5MHz clocking of the state machine), ADRSTB* and DATSTB* are asserted.
While neither DATACK* nor TFRERR* are asserted, the machine waits in its current state.

Once one of the acknowledges is asserted, it progresses to the next state, which removes the address, data, and commands from the bus,
and sits and waits for the acknowledge to be released.
Only once this is done will the CPU's RDY signal assert again, and the state machine is ready for another transfer.

## Reading

Memory writes looks complicated, and compared to the typical peripheral interfaces, it kind of is.
But, reading is more complex still, because we cannot rely on the parallel operation of the CPU and the bus bridge.
Instead, the CPU really does have to wait for the data to arrive off the STEbus before it can continue.
In other words, the state of CPURDY must negate immediately after a read from STE-space is detected,
not just after the CPU write is complete.

Instead of the CPU writing a byte and then we ferry it to the STE-space,
the state machine must first request the byte from the STE-space and then ferry it to the 65816.

ASIDE: If you've ever wondered why writes to video cards on PCs were always so much faster than reads from the video frame buffer,
this is exactly why: with writes, you can fire-and-forget, and let the bus bridge deal with the responsibility of ferrying data while the CPU is off doing other things.
When reading, the CPU has no choice but to wait for the bridge to accomplish its goal first.

We still latch the address and command like before (and can reuse that circuitry).
However, instead of driving the data bus,
we instead drive a transparent latch whose outputs are CPU-facing.
So, our data path now looks something like this:

                                            +----------+
                                            |          |+
                        A0-A19 (cpu) ======>|D  '373  Q|=====> A0-A19 (ste)
                                            |          ||
                               CPUWR o------|LE        ||
                             _STEDRA o-----o|/OE       ||
                                            |          ||
                                            +----------+|
                                            |          |+
                                R/W  o------|D        Q|-----> CM0
                                A20  o------|D  '373  Q|-----> CM1
                                +Vcc o------|D        Q|-----> CM2
                                            |          |
                               CPUWR o------|LE        |
                             _STEDRA o-----o|/OE       |
                                            |          |
                                            +----------+
                                            |          |
                                     #=====>|D  '373  Q|=====#
                                     #      |          |     #
                         CPUWR o------------|LE        |     #
                       _STEDRD o-----------o|/OE       |     #
                                     #      |          |     #=====> D0-D7 (ste)
                   D0-D7 (cpu) >=====#      +----------+     #
                                     #      +----------+     #
                                     #      |          |     #
                                     #======|Q  '373  D|<====#
                                            |          |
                               STERD o------|LE        |
                             _CPUDRD o-----o|/OE       |
                                            |          |
                                            +----------+

And our timing diagram can be refactored into a sections that shows reads distinctly from writes.
Two different sets of states will cover reads (T0-T4, as before) and writes (T0, T5-T9), respectively.

                                      ____________              ____________               ____________
        phi2            \____________/            \____________/            \_____________/            \
                        _______________
        _STECS                     \\\\\________________________________________________________________
                        ___ _________ __ ____ ___________ ______ __ _____________________ __ ___________
        D0-D7 (wr)      |||X_________X__X||||X____W1_____X||||||X__X_________W2__________X__X_______W2__
                        ___ _________ __ _______________________ __ ______ ______ _______ __ ___________
        D0-D7 (rd)      |||X_________X__X|||||||||||||||||||||||X__X||||||X__R1__X|||||||X__X___________

                        . . . . . . . . . . . . . . . . . . . writes . . . . . . . . . . . . . . . . . .

                                          _______________                                   ____________
        CPUWR           _________________/               \_________________________________/
                        _________________________________                            ___________________
        CPURDY                                           \__________________________/
                        ________________ ________________ _____ _________ _________ _________ __________
        T               _______0________X_______1________X__2__X____3____X____4____X____0____X_____1____
                        __________________________________                 _____________________________
        _STEDRx                                           \_______________/
                                                           _______________
        A0-A19 (ste)    ----------------------------------<_______________>-----------------------------
                                                           _______________
        D0-D7 (ste)     ----------------------------------<_______________>-----------------------------
                        ________________________________________           _____________________________
        xxxSTB*                                                 \_________/
                        _______________________________________________           ______________________
        DATACK*                                                        \_________/

                        . . . . . . . . . . . . . . . . . . . reads . . . . . . . . . . . . . . . . . .

                        ________________ _____ ______________________ _____ _____ ______________________
        T               _______0________X__5__X___________6__________X__7__X__9__X___________0__________
                                      ____________              ____________               ____________
        phi2            \____________/            \____________/            \_____________/            \
                        _______________                                           ______________________
        _STECS                     \\\\\_________________________________________///////////
                        ___ _________ __ _______________________ __ ______ ______ _______ __ ___________
        D0-D7 (rd)      |||X_________X__X|||||||||||||||||||||||X__X||||||X__R1__X|||||||X__X___________

                        __________________                                  ____________________________
        CPURDY (rd)                       \________________________________/             
                                           ____________________________
        A0-A19 (ste)    ------------------<____________________________>--------------------------------
                                                  ___________________ ______
        D0-D7 (ste)     -------------------------<___________________X__R1__>---------------------------
                        _______________________                        _________________________________
        xxxSTB* (rd)                           \______________________/
                        _____________________________________________       ____________________________
        DATACK*                                                      \_____/
                        __________________                              ________________________________
        _STEDRA                           \____________________________/
                                           ____________________________
        STERD           __________________/                            \________________________________

The T-state table is modified as follows (note how T0 is adjusted, and T5-T9 are added):

| T | Transition Rule                                                                 | Signals Asserted               | Description                                                             |
|:-:|:--------------------------------------------------------------------------------|:-------------------------------|:------------------------------------------------------------------------|
| 0 | If (phi2, STECS, R/W) = (1, 1, 0), goto T1; else if (1, 1, 1) goto T5; else T0. | CPURDY                         | Wait for a read or write to STE space.                                  |
| 1 | If phi2 = 1, goto T1; else T2.                                                  | CPURDY, CPUWR                  | Wait for write-cycle to complete; latch signals.                        |
| 2 | Goto T3.                                                                        | STEDRA, STEDRD                 | Drive address/command bus with latched signals.  (Provide set-up time.) |
| 3 | IF xxxACK = 0, goto T3; else T4.                                                | STEDRA, STEDRD, xxxSTB         | Drive strobes; wait for DATACK or TFRERR.                               |
| 4 | IF xxxACK = 1, goto T4; else T0.                                                |                                | Clear address, command, data; wait for cycle to end.                    |
| 5 | Goto T6.                                                                        | CPUWR, STEDRA                  | Halt CPU right away.  Latch address and command from CPU.               |
| 6 | If xxxACK = 0, goto T6; else T7.                                                | STEDRA, xxxSTB, STERD          | Latch data bus from STEbus; wait for acknowledgement.                   |
| 7 | If (xxxACK, phi2) = (1, 0), goto T8; else if (1, 1), goto T9; else T7.          |                                | Wait for acknowledge to clear.                                          |
| 8 | If phi2 = 0, goto T8; else T9.                                                  |                                | Wait for phi2 high (synchronize against 65816 bus cycle)                |
| 9 | If phi2 = 1, goto T9; else T0.                                                  | CPURDY, CPUDRD                 | Wait for end of host CPU cycle.  Drive data for host CPU.               |

As you can imagine, the logic for implementing this table isn't particularly hard to make,
however it is laborious and error-prone if building this out with discrete components.
This is why most (all?) STEbus initiators use some flavor of programmable logic.