Working with Mantle Directly

In this tutorial, you will learn how to invoke Mantle services and see how the pieces fit together. This tutorial isn’t intended to be used as a reference, as it will not cover every Mantle system call.

Our mission is to create a simple stop-watch application. No fancy graphics; just enough to put something up on the screen. No pointing devices; just a keyboard interface. Again, just enough to illustrate the basics of using Mantle.

Upon start-up, the timer on the display will read all zeros. If you tap the space bar, the timer will start counting up, one tenth of a second at a time. If you tap the space bar again, the timer will stop where it is at. Repeatedly tapping the space bar will switch between these two modes. Tapping R (capital!) will cause the timer to be reset. You can reset the timer at any time. Tapping Q (also capital!) will cause the application as a whole to exit.

The PRG Header

Each loadable module intended for the Kestrel-2EX environment must start with a PRG header. This informs Mantle how much space in RAM your program takes up. Note that all programs which use the PRG header must be position-independent, for the header does not include any relocation information.

For our stop-watch, the header looks as follows:

; ================================================================
; PRG a.out header

        jal     x0,_text        ; a.out header magic identifier
        word    textsize        ; TEXT size
        word    datasize        ; DATA size
        word    0               ; BSS size
        word    4096            ; default stack size
        word    0,0,0           ; reserved; zero for future compatibility

_text:

The header’s jal instruction must be the first instruction in the binary. Following that are four words providing Mantle with information on how big the program is.

textsize indicates how much program code and read-only data exists. This must be at least a multiple of four bytes in size; eight is preferred.
datasize does the same for data that cam be read or written to. This also must be a multiple of four bytes in size; eight is preferred.
Next is the BSS size. This tells the loader how much uninitialized data space is needed. We don’t use any for this tutorial, so we set its size to 0. This must be a multiple of four bytes; eight is preferred.
Next is the desired program stack size. Because the stack is used to temporarily hold CPU register state and program addresses, this field must be a multiple of eight bytes, at the least.

The sum of all four of these numbers determines how much raw memory is allocated when the program is loaded.

NOTE: Mantle currently loads the text and data sections into memory exactly as found in the PRG file, sans header. The BSS and stack memory immediately follows, although these are not included in the PRG file. The lowest address is the first byte of the text section, while the highest byte is the last byte of the stack. Assuming a program is loaded at address LA, then memory looks like this:

+--------------+------+-------------+-------+
| ... text ... | data | ... BSS ... | stack |
+--------------+------+-------------+-------+
|              |      |             |       |
LA+0           |      |             |       default SP
     LA+textsize      |             |
   LA+textsize+datasize             |
         LA+textsize+datasize+bsssize

The remaining three words are reserved and currently unused. Set these to zero for compatibility with newer versions of Mantle.

The Default Event Handler Procedure

Early operating systems treated applications as if they owned the whole computer. They worked hard to provide a nice abstraction that worked well for batch-oriented program design in time-shared systems, for it was easy to keep system details in mind this way. You didn’t have to worry about competing with other programs because either they were scheduled to run after your program finished, or the OS used multi-programming techniques to give your program the illusion that it ran in isolation from everyone else (to varying degrees of success).

Batch-oriented means a program was designed with an explicit start and end in mind. I/O was always performed at the behest of the program, or so it was lead to believe. Once all I/O had completed, there was no more work for the program to perform, and so it would terminate, allowing another program to run. If you’ve ever written shell scripts on Unix or JCL for IBM mainframes, you are quite familiar with batch-oriented programming. In fact, on MS-DOS, OS/2, and Windows systems, such scripts are literally called batch files for this very reason.

Even modern, event-driven OS environments work hard to preserve this batch-mode behavior. Windows still have a WinMain procedure, from where a thread’s control starts and is expected to finish once the application’s event loop quits. Classic Mac applications would actually take over the hardware so thoroughly that they were expected to periodically call a special system call, SystemTask, which basically let the rest of the OS function on behalf of the application. But this isn’t just a Windows- or Macintosh-specific thing: modern Gtk applications on Linux have the same architecture, requiring the main thread to manually kick off the GUI event loop. On AmigaOS, the main thread is responsible for opening and configuring its windows and GUI resources, then again, sitting in an explicitly coded event loop. Classic MacOS and GS/OS both share this same architecture as well.

Only a small number of operating systems exist which breaks this fundamental assumption.

GEOS for the Commodore and Apple II 8-bit family,
the Uxn virtual machine, and last but not least,
Mantle.

All programs for Mantle must contain at least one event handler. The first instruction after the PRG header, which corresponds to the first instruction in the loaded binary image, is assumed to be the program’s default event handler, or default evproc.

How Evprocs are Called

Evprocs are called with (currently) three parameters:

Register	Purpose
A0	Indicates the kind of message the event represents.
A1	Provides a relative indication of when the event happened.
A2	Provides a message-specific bit of additional information.

Note that while the event is being processed, no additional events may be processed. This is what we mean when we say that the application is not interruptable. The only time additional events may be handled is after an event handler terminates and returns back to the Mantle event loop. It may do this via the ecNextEvent system call. This system call accepts parameters in registers A0-A2; but, the meaning of these parameters are dependent on which event is being handled. We’ll illustrate this shortly.

For this reason, we place the following code immediately after the PRG header.

; ================================================================
; Program Code ("text")

; Default event handler procedure (aka "evproc").  Event handlers are
; called with (as of this writing) three parameters in the following
; registers:
;
; A0 - message type
; A1 - timestamp, allowing a handler to detect double-click events, etc.
; A2 - "P2" parameter, which further customizes the message.
;
; The default evproc always resides starting at the first instruction of
; the program, which happens to reside immediately following the PRG header.
; The sp register will point to the bottom of the default stack, so subroutine
; calls are safe.
;
; Note that evprocs are always called with a *fresh* stack.
;
; We terminate the evproc with an ecall to ecNextEvent.


_default_evproc:
        jal     ra,_data                ; set GP to base address of data
        jal     ra,dispatch             ; dispatch based on message type
        addi    a7,x0,ecNextEvent       ; return to event loop
        ecall

The default evproc’s first task upon being called is to establish where we can find our program data. For now, we’ll skip over this detail; suffice it to say that we load the GP register with a pointer to our program’s data. Afterwards, we dispatch to one of several different event handlers. This will decode the meaning of the event by looking at the A0 and A2 registers, and will decide what specific part of the program is responsible for handling that event. We call our dispatcher using jal ra, so that our event handlers can return like any other subroutine. Then, once the handler returns, we can invoke the ecNextEvent system call to tell Mantle that we’re done working on this event.

NOTE: Just like Uxn, Mantle does not interrupt a program while it is executing an event handler. Therefore, it is the event handler’s responsibility to not only be quick about things, but also to let the system know when it’s finished.

Uxn used a distinguished opcode for this (BRK, as distinct from RET for returning from subroutines). Similarly, Mantle requires we make a system call, ecNextEvent, to “return” from the currently running event handler.

Note that the ecNextEvent system call does not return. We also assume that registers A0-A2 are preloaded as needed for the event handler.

Evproc Stack

One more thing before we move on from the default evproc: what stack is it using when processing an event? Recall after the PRG is loaded into memory, a default stack is allocated (whose size is indicated in the PRG header). When the default evproc is called, the SP register will point to the bottom of this default stack. This guarantees that all events will have a guaranteed known-good stack to use.

+--------------+------+-------------+-------+
| ... text ... | data | ... BSS ... | stack |
+--------------+------+-------------+-------+
|                                           |
default evproc              default evproc SP
called here

Most applications, even sophisticated applications, can get away with having only one evproc and stack. However, Mantle does provide system calls that lets you use any number of event handlers and stacks as your application needs.

The ecSetEvProcPC system call is used to select a current event handler, while ecSetEvProcSP selects a stack pointer to use upon entry. These can be used to very quickly select different modes of application behavior, without complecting the design of an event dispatcher.

NOTE: If you’re changing evprocs, do be careful; getting the details wrong can lead to a crash.

The Stop Watch Dispatcher

Let’s now take a look at the actual dispatch logic.

dispatch:
        beq     a0,x0,on_init           ; mtInit -> initialize program for 1st time

        addi    t0,x0,mtKeyDown
        beq     a0,t0,on_key_down       ; mtKeyDown -> handle key press

        addi    t0,x0,mtTimerTick
        beq     a0,t0,on_timer_tick     ; mtTimerTick -> handle timer tick

        addi    a0,x0,0                 ; Assume safe return value
        addi    a1,x0,0
        addi    a2,x0,0
        jalr    x0,0(ra)                ; if unrecognized event, just ignore it

The stop-watch application depends on three kinds of event sources:

program initialization (which we only need to handle one time),
timer ticks (since it’s a stop watch), and,
key presses (since we operate it with the keyboard).

We now examine how the stop watch handles these kinds of events.

Program Initialization

All applications are required to process initialization events, which are identified by Mantle setting register A0 to zero. We inquire about the current screen size and where its frame buffer resides in memory, since we need this to draw the screen output. We then configure Mantle to let it know of our interest in key and timer events. Finally, we paint our initial screen display, then return to the event loop.

NOTE: When a new application starts up, by default, it does not have any “interest” in system events. The application has to tell Mantle which events it wants to receive via the ecSetEventsDesired system call.

; Invoked during program startup.  Render the display as we wish, configure
; desired set of events we're interested in, and return to event loop.

on_init:
        addi    sp,sp,-16
        sd      ra,0(sp)

        jal     ra,TimerReset

        addi    a7,x0,ecGetScreenConfig
        ecall
        sh      a0,g_scrWidth-_data(gp)
        sh      a1,g_scrHeight-_data(gp)
        sd      a2,g_scrBase-_data(gp)

        addi    a0,x0,mfKeyDown
        sd      a0,g_eventMask-_data(gp)        ; we'll need this for toggling modes later
        addi    a7,x0,ecSetEventsDesired
        ecall
        sd      x0,g_tmrHandle-_data(gp)

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,TimerPaint

        addi    a7,x0,ecEndPaint
        ecall

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

Processing Key Presses

Handling key down events requires us to inspect the key code and react accordingly. We define the following key presses and their behaviors:

R will reset the current timer.
Q will quit the application.
Space will toggle the stop-watch, between active and inactive.

Any other key will be ignored.

For the purposes of handling key down events, register A2 holds the virtual key code in the lower 32 bits. We filter out the lowest 8 bits to obtain the equivalent ASCII key code.

; Invoked when a key is pressed.  A2 holds the key code in bits 0..31.

on_key_down:
        addi    t0,x0,$52               ; "R"
        andi    a2,a2,$FF
        beq     a2,t0,on_R_pressed

        addi    t0,x0,$51               ; "Q"
        beq     a2,t0,on_Q_pressed

        addi    t0,x0,$20               ; " " (space)
        beq     a2,t0,on_spc_pressed

        addi    a0,x0,0                 ; unknown key; ignore it
        jalr    x0,0(ra)

on_Q_pressed:
        addi    a0,x0,0
        addi    a7,x0,ecQuitVM
        ecall
        jalr    x0,0(ra)

on_R_pressed:
        addi    sp,sp,-16
        sd      ra,0(sp)

        jal     ra,TimerReset

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,TimerPaint

        addi    a7,x0,ecEndPaint
        ecall

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

on_spc_pressed:
        addi    sp,sp,-16
        sd      ra,0(sp)

        ; If event mask has timer tick enabled...
        ld      a0,g_eventMask-_data(gp)
        andi    a0,a0,mfTimerTick
        beq     a0,x0,osp2

        ; then turn it off.
        ld      a0,g_tmrHandle-_data(gp)
        addi    a7,x0,ecStopTimer
        ecall
        ld      a0,g_eventMask-_data(gp)
        xori    a0,a0,mfTimerTick
        sd      a0,g_eventMask-_data(gp)
        addi    a7,x0,ecSetEventsDesired
        ecall

        addi    a0,x0,1
        bne     a0,x0,osp1

        ; Otherwise, turn it (back) on.
osp2:   ld      a0,g_eventMask-_data(gp)
        ori     a0,a0,mfTimerTick
        sd      a0,g_eventMask-_data(gp)
        addi    a7,x0,ecSetEventsDesired
        ecall

        addi    a0,x0,100
        addi    a7,x0,ecStartTimer
        ecall
        sd      a0,g_tmrHandle-_data(gp)

osp1:   ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

Processing Timer Events

Handling the timer is relatively simple, since our application only uses a single timer. We basically account for the tick, then update the display again. Note that we “return 1” to the Mantle kernel to let it know that we wish the timer to keep ticking, that this is not just a one-shot event.

; Invoked approximately 10 times every second

on_timer_tick:
        addi    sp,sp,-16
        sd      ra,0(sp)

        jal     ra,TimerTick

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,TimerPaint

        addi    a7,x0,ecEndPaint
        ecall

        addi    a0,x0,1

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

Timer Behavior

The timer’s behavior is contained in three procedures:

TimerReset — zeroes the timer
TimerTick — increments the timer
TimerPaint — paints the current state of the timer to the screen.

TimerReset is simple enough to implement:

TimerReset:
        addi    t0,x0,0
        sb      t0,g_tmrHH-_data(gp)    ; zero all timer counters
        sb      t0,g_tmrHL-_data(gp)
        sb      t0,g_tmrMH-_data(gp)
        sb      t0,g_tmrML-_data(gp)
        sb      t0,g_tmrSH-_data(gp)
        sb      t0,g_tmrSL-_data(gp)
        sb      t0,g_tmrT-_data(gp)
        jalr    x0,0(ra)

Incrementing the timer is done using binary-coded decimal, mainly because I’m too lazy to include libraries to convert from binary to decimal.

We start by incrementing the tenths-of-a-second place, and carrying over to more significant digits only if needed. The pseudo-code for this is embedded in the comments, but here it is again for easier reading:

TO TimerTick
    Start at the deciseconds field.
    BEGIN
        Add 1 to BCD digit.
        IF BCD digit does not exceed corresponding limit THEN
            RETURN; we're done, for there is no carry to propagate.
        END
        Set BCD digit to zero.
        Move to the next digit.
    END
END

Expressed as assembly language, we have the following procedure:

TimerTick:
        ; Start at the deciseconds field.
        addi    t0,gp,g_tmrT-_data
        addi    t1,gp,g_tmrDelta-_data
        addi    t2,gp,g_tmrBCDLimits-_data
        addi    t3,x0,7

        ; BEGIN
TT1:
        ;       Add 1 to BCD digit.
        lb      t4,0(t0)
        addi    t4,t4,1
        sb      t4,0(t0)

        ;       IF BCD digit does not exceed corresponding limit THEN
        lb      t5,0(t2)
        beq     t4,t5,TT2

        ;               RETURN; we're done, for there is no carry to propagate.
        jalr    x0,0(ra)

        ;       END
TT2:
        ;       Set BCD digit to zero.
        sb      x0,0(t0)

        ;       Move to next digit.
        lb      t5,0(t1)
        add     t0,t0,t5
        addi    t1,t1,1
        addi    t2,t2,1
        addi    t3,t3,-1

        ; UNTIL there are no more digits left.
        bne     t3,x0,TT1

The TimerPaint procedure is called to redraw the timer to the screen. Earlier, during program initialization, we called upon Mantle to tell us where the screen resides in memory. We also learned how big the screen was, in pixels, along each axis. This is enough information to calculate how to plot 8x8 pixel characters (“glyphs”) to the screen.

Unfortunately, there is a technicality. If we were running on raw hardware, then any changes to the frame buffer would appear instantly on the next monitor refresh. However, for reasons of efficiency, emulators may not implement this cycle-accurate behavior. Thus, we must collaborate with the emulator to ensure our changes appear on the screen in a timely manner.

We do this with the ecBeginPaint and ecEndPaint system calls.

Call	Purpose
`ecBeginPaint`	Tells the host environment that we are about to start writing to a video resource that could affect the display.
`ecEndPaint`	Tells the host environment that we have completed our changes, and it is now eligible for redisplay.

NOTE: Since I’m still learning about what is required to provide display updates efficiently, these system calls might be altered or augmented in a future release of Mantle.

The basic rule is simple: for every call to ecBeginPaint, there must be a corresponding call to ecEndPaint. Only when they perfectly balance is the display guaranteed to be updated in an emulator. For non-emulated environments, behavior will be hardware specific; however, it must always be compatible with an emulated environment. The k2 emulator running under Linux is defined to be the source of truth for the behavior of any Kestrel-2EX hardware platform and implementation of the Mantle kernel interface.

The actual drawing is performed with two procedures which we’ll discuss in the next section.

TimerPaint:
        addi    sp,sp,-16
        sd      ra,0(sp)

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,BlackScreen

        addi    a0,gp,g_tmrHH-_data
        addi    a1,gp,g_fontTab-_data
        addi    a2,x0,1
        ld      a3,g_scrBase-_data(gp)
        lh      a4,g_scrWidth-_data(gp)
        srli    a4,a4,3
        addi    a5,x0,8
        addi    a6,x0,10
        jal     ra,StringPlot

        addi    a7,x0,ecEndPaint
        ecall
        
        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

The Graphics Output Procedures

For our purposes, we need only two graphics primitives:

BlackScreen — clears the screen to black.
StringPlot — low-level printer to output a string to the display.

The simpler of the two procedures is BlackScreen, which just fills the display bitmap with zeros.

This procedure is implemented as a nested set of loops, one which covers the vertical axis, and one which covers the horizontal axis. This allows us to dynamically handle different screen resolutions without needing any math libraries.

; Paint the whole screen black.

BlackScreen:
        ld      t0,g_scrBase-_data(gp)  ; T0 = screen base address
        lh      t1,g_scrHeight-_data(gp); T1 = height in pixels
BS1:    lh      t2,g_scrWidth-_data(gp) ; T2 = width in pixels
        srli    t2,t2,6                 ; T2 = width in dwords
BS2:    sd      x0,0(t0)
        addi    t0,t0,8
        addi    t2,t2,-1
        bne     t2,x0,BS2
        addi    t1,t1,-1
        bne     t1,x0,BS1
        jalr    x0,0(ra)

The StringPlot procedure is a simple one as well, but it has a complex setup. This is required because we aim to support different screen resolutions without needing math libraries or more sophisticated video driver modules.

NOTE: These things are definitely required in any “real world” software environment; however, for now, we are focusing exclusively on the Mantle/userspace interface, and choosing to ignore such niceties for now.

To print a string, we really only need A0 and A6 to be set to the string’s address and length. Register A3 determines where on the screen we will print the string; it points to the first of a vertical stack of bytes corresponding to a character cell in more conventional display technologies.

The other registers can be initialized to fixed constants if desired:

A1 points to the first byte of a bitmap containing a font. For our purposes, the font is only 12 characters wide, since we only need the numeric digits, a colon (:), and a period (.). However, the maximum width of the font is 2048 pixels, corresponding to 256 8-bit wide glyphs.
A2 contains the width (in bytes, not pixels) of the font bitmap.
A4 equals the width of the screen in bytes, not in pixels.
A5 is currently unused here, but will eventually be used for the font bitmap height.

These unusual register conventions are designed to minimize loading and storing while printing text to the screen. Any temporary storage that happens is done with other CPU registers. Note that when we restore values, we actually update the values at the same time, further streamlining the printing process.

; Place a whole string onto the screen.
;
; A0 = string pointer
; A1 = address of font bitmap
; A2 = width of font bitmap in bytes
; A3 = address of screen bitmap where to put it
; A4 = width of screen bitmap in bytes
; A5 = unused; preserved
; A6 = # of characters in string to place.
;
; A0, A1, A3, A5, A6 destroyed.

StringPlot:
        addi    sp,sp,-48
        sd      ra,0(sp)
        sd      s1,8(sp)
        sd      s3,16(sp)
        sd      s5,24(sp)
        sd      s6,32(sp)
        sd      s0,40(sp)

SP1:    addi    s0,a0,0
        addi    s1,a1,0
        addi    s3,a3,0
        addi    s5,a5,0
        addi    s6,a6,0

        lb      a0,0(a0)
        jal     ra,CharPlot

        addi    a0,s0,1
        addi    a1,s1,0
        addi    a3,s3,1
        addi    a5,s5,0
        addi    a6,s6,-1
        bne     a6,x0,SP1

        ld      s0,40(sp)
        ld      s6,32(sp)
        ld      s5,24(sp)
        ld      s3,16(sp)
        ld      s1,8(sp)
        ld      ra,0(sp)
        addi    sp,sp,48
        jalr    x0,0(ra)

To print each individual character, StringPlot depends on a subroutine called CharPlot. This routine sets registers up for a call to GlyphPlot or GlyphPlot2H, depending on your personal preferences for how tall you want the string to appear.

; Place a single character onto the screen.
;
; A0 = character code
; A1 = address of font bitmap
; A2 = width of font bitmap in bytes
; A3 = address of screen bitmap where to put it
; A4 = width of screen bitmap in bytes
; A5 = #rows to copy
;
; A0, A1, A3, A5, A6 destroyed.

CharPlot:
        addi    sp,sp,-16
        sd      ra,0(sp)

        slli    a0,a0,3                 ; calculate addr of glyph to show
        add     a1,a1,a0

        addi    a5,x0,8                 ; 2^3 rows to copy

        jal     ra,GlyphPlot2H          ; Everything else is right

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

The two remaining procedures are shown below, GlyphPlot and GlyphPlot2H. Each does the same thing: copies a character image from the font bitmap onto the screen. The difference between them is that the latter is a double-height variation, making the text output easier to read.

; Place a single 8xN glyph onto the screen.
;
; A0 = unused; preserved.
; A1 = address of character glyph data
; A2 = width of font bitmap in bytes
; A3 = address on screen bitmap to put it
; A4 = width of screen bitmap in bytes
; A5 = #rows to copy
;
; A1, A3, A5, A6 destroyed.

GlyphPlot:
        lb      a6,0(a1)
        sb      a6,0(a3)
        add     a1,a1,a2
        add     a3,a3,a4
        addi    a5,a5,-1
        bne     a5,x0,GlyphPlot
        jalr    x0,0(ra)

; Double-height version of GlyphPlot, making it easier to read on high-res
; displays.  Same parameters, same registers destroyed.
;
; Note that this is different from calling GlyphPlot with twice as many rows
; to copy.  If you have a real 8x16 font, use GlyphPlot with 16 rows to copy,
; and not GlyphPlot2H (the latter would result in a 32px tall glyph).
;
; Typically used for things like headings, window titles, etc.

GlyphPlot2H:
        lb      a6,0(a1)
        sb      a6,0(a3)
        add     a1,a1,a2
        add     a3,a3,a4
        sb      a6,0(a3)
        add     a3,a3,a4
        addi    a5,a5,-1
        bne     a5,x0,GlyphPlot2H
        jalr    x0,0(ra)

Font information is located at g_fontTab, with each glyph taking up eight bytes. Each 8x8 glyph is recorded in a kind of backwards format from what you might be thinking, though. Let’s look at an example, the number four:

byte    $30   ; 00110000
byte    $38   ; 00111000
byte    $3C   ; 00111100
byte    $36   ; 00110110
byte    $7E   ; 01111110
byte    $30   ; 00110000
byte    $30   ; 00110000
byte    $00   ; 00000000

As you can see from the binary representation, the 4 is rendered backwards. This is because the MGIA controller draws the screen from bit 0 on the left to bit 7 on the right. This “little endian” approach allows little-endian processors, such as RISC-V, to manipulate bulk quantities of pixels with 16-, 32-, or even 64-bit instructions.

NOTE: This stands in sharp contrast to other platforms, like the Apple II or Commodore 64/128, where bit 7 corresponds to the left-most pixel and bit 0 to the right-most pixel. These platforms work well with this “big-endian” serialization of video data largely because the 6502 lacks 16-, 32-, or 64-bit data path operations.

The Full Program

You’ve seen all the major components of the stop watch application, and learned how its interface to Mantle works. Below, we put all the pieces together to produce the full program listing. It can be built using the a.py assembler with the following command: a.py from default.prg.s to default.prg. This should yield a binary PRG file which can be run in the k2 emulator as-is.

; This is a very simple demonstration program that puts up a simple
; stopwatch application.
;
; Press R (capital) to reset the stopwatch back to 00:00:00.0.
; Press space bar to toggle the stopwatch.
;
; This application is intended to showcase how to communicate with the
; k2 emulator API using the ECALL instruction.
;
; This program is built with the included `a.py` assembler, as follows:
;
;     a.py from default.prg.s to default.prg


        include "registers.inc"


; ECALL operation codes

ecNextEvent = 0
ecGetScreenConfig = 1
ecDumpRegs = 2
ecSetEventsDesired = 3
ecSetEvProcPC = 4
ecSetEvProcSP = 5
ecBeginPaint = 6
ecEndPaint = 7
ecStartTimer = 8
ecStopTimer = 9
ecQuitVM = 10


; Message Types

mtInit = 0
mtKeyDown = 1
mtKeyUp = 2
mtTimerTick = 3

mfInit = $0000000000000001
mfKeyDown = $0000000000000002
mfKeyUp = $0000000000000004
mfTimerTick = $0000000000000008


; ================================================================
; PRG a.out header

        jal     x0,_text        ; a.out header magic identifier
        word    textsize        ; TEXT size
        word    datasize        ; DATA size
        word    0               ; BSS size
        word    4096            ; default stack size
        word    0,0,0           ; reserved; zero for future compatibility

_text:

; ================================================================
; Program Code ("text")

; Default event handler procedure (aka "evproc").  Event handlers are
; called with (as of this writing) three parameters in the following
; registers:
;
; A0 - message type
; A1 - timestamp, allowing a handler to detect double-click events, etc.
; A2 - "P2" parameter, which further customizes the message.
;
; The default evproc always resides starting at the first instruction of
; the program, which happens to reside immediately following the PRG header.
; The sp register will point to the bottom of the default stack, so subroutine
; calls are safe.
;
; Note that evprocs are always called with a *fresh* stack.
;
; We terminate the evproc with an ecall to ecNextEvent.


_default_evproc:
        jal     ra,_data                ; set GP to base address of data
        jal     ra,dispatch             ; dispatch based on message type
        addi    a7,x0,ecNextEvent       ; return to event loop
        ecall

dispatch:
        beq     a0,x0,on_init           ; mtInit -> initialize program for 1st time

        addi    t0,x0,mtKeyDown
        beq     a0,t0,on_key_down       ; mtKeyDown -> handle key press

        addi    t0,x0,mtTimerTick
        beq     a0,t0,on_timer_tick     ; mtTimerTick -> handle timer tick

        addi    a0,x0,0                 ; Assume safe return value
        addi    a1,x0,0
        addi    a2,x0,0
        jalr    x0,0(ra)                ; if unrecognized event, just ignore it


; Invoked during program startup.  Render the display as we wish, configure
; desired set of events we're interested in, and return to event loop.

on_init:
        addi    sp,sp,-16
        sd      ra,0(sp)

        jal     ra,TimerReset

        addi    a7,x0,ecGetScreenConfig
        ecall
        sh      a0,g_scrWidth-_data(gp)
        sh      a1,g_scrHeight-_data(gp)
        sd      a2,g_scrBase-_data(gp)

        ; NOTE: You MUST enable the timer tick BEFORE adding the timer.
        ; If you do not, the timer may fire before you have a chance to handle
        ; it, and since the interest "isn't there" from Mantle's POV, it'll just
        ; turn the timer tick off again.

        addi    a0,x0,mfKeyDown
        sd      a0,g_eventMask-_data(gp)        ; we'll need this for later
        addi    a7,x0,ecSetEventsDesired
        ecall

        sd      x0,g_tmrHandle-_data(gp)

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,TimerPaint

        addi    a7,x0,ecEndPaint
        ecall

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)
        

; Invoked when a key is pressed.  A2 holds the key code in bits 0..31.

on_key_down:
        addi    t0,x0,$52               ; "R"
        andi    a2,a2,$FF
        beq     a2,t0,on_R_pressed

        addi    t0,x0,$51               ; "Q"
        beq     a2,t0,on_Q_pressed

        addi    t0,x0,$20               ; " " (space)
        beq     a2,t0,on_spc_pressed

        addi    a0,x0,0                 ; unknown key; ignore it
        jalr    x0,0(ra)

on_Q_pressed:
        addi    a0,x0,0
        addi    a7,x0,ecQuitVM
        ecall
        jalr    x0,0(ra)

on_R_pressed:
        addi    sp,sp,-16
        sd      ra,0(sp)

        jal     ra,TimerReset

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,TimerPaint

        addi    a7,x0,ecEndPaint
        ecall

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

on_spc_pressed:
        addi    sp,sp,-16
        sd      ra,0(sp)

        ; If event mask has timer tick enabled...
        ld      a0,g_eventMask-_data(gp)
        andi    a0,a0,mfTimerTick
        beq     a0,x0,osp2

        ; then turn it off.
        ld      a0,g_tmrHandle-_data(gp)
        addi    a7,x0,ecStopTimer
        ecall
        ld      a0,g_eventMask-_data(gp)
        xori    a0,a0,mfTimerTick
        sd      a0,g_eventMask-_data(gp)
        addi    a7,x0,ecSetEventsDesired
        ecall

        addi    a0,x0,1
        bne     a0,x0,osp1

        ; Otherwise, turn it (back) on.
osp2:   ld      a0,g_eventMask-_data(gp)
        ori     a0,a0,mfTimerTick
        sd      a0,g_eventMask-_data(gp)
        addi    a7,x0,ecSetEventsDesired
        ecall

        addi    a0,x0,100
        addi    a7,x0,ecStartTimer
        ecall
        sd      a0,g_tmrHandle-_data(gp)

osp1:   ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

; Invoked approximately 10 times every second

on_timer_tick:
        addi    sp,sp,-16
        sd      ra,0(sp)

        jal     ra,TimerTick

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,TimerPaint

        addi    a7,x0,ecEndPaint
        ecall

        addi    a0,x0,1

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)


; ================================================================
; Timer Routines

TimerReset:
        addi    t0,x0,0
        sb      t0,g_tmrHH-_data(gp)    ; zero all timer counters
        sb      t0,g_tmrHL-_data(gp)
        sb      t0,g_tmrMH-_data(gp)
        sb      t0,g_tmrML-_data(gp)
        sb      t0,g_tmrSH-_data(gp)
        sb      t0,g_tmrSL-_data(gp)
        sb      t0,g_tmrT-_data(gp)
        jalr    x0,0(ra)

TimerPaint:
        addi    sp,sp,-16
        sd      ra,0(sp)

        addi    a7,x0,ecBeginPaint
        ecall

        jal     ra,BlackScreen

        addi    a0,gp,g_tmrHH-_data
        addi    a1,gp,g_fontTab-_data
        addi    a2,x0,1
        ld      a3,g_scrBase-_data(gp)
        lh      a4,g_scrWidth-_data(gp)
        srli    a4,a4,3
        addi    a5,x0,8
        addi    a6,x0,10
        jal     ra,StringPlot

        addi    a7,x0,ecEndPaint
        ecall
        
        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

TimerTick:
        ; Start at the deciseconds field.
        addi    t0,gp,g_tmrT-_data
        addi    t1,gp,g_tmrDelta-_data
        addi    t2,gp,g_tmrBCDLimits-_data
        addi    t3,x0,7

        ; BEGIN
TT1:
        ;       Add 1 to BCD digit.
        lb      t4,0(t0)
        addi    t4,t4,1
        sb      t4,0(t0)

        ;       IF BCD digit does not exceed corresponding limit THEN
        lb      t5,0(t2)
        beq     t4,t5,TT2

        ;               RETURN; we're done, for there is no carry to propagate.
        jalr    x0,0(ra)

        ;       END
TT2:
        ;       Set BCD digit to zero.
        sb      x0,0(t0)

        ;       Move to next digit.
        lb      t5,0(t1)
        add     t0,t0,t5
        addi    t1,t1,1
        addi    t2,t2,1
        addi    t3,t3,-1

        ; UNTIL there are no more digits left.
        bne     t3,x0,TT1


; ================================================================
; Graphics Routines
;
; Normally, these would be placed in a library.


; Paint the whole screen black.

BlackScreen:
        ld      t0,g_scrBase-_data(gp)  ; T0 = screen base address
        lh      t1,g_scrHeight-_data(gp); T1 = height in pixels
BS1:    lh      t2,g_scrWidth-_data(gp) ; T2 = width in pixels
        srli    t2,t2,6                 ; T2 = width in dwords
BS2:    sd      x0,0(t0)
        addi    t0,t0,8
        addi    t2,t2,-1
        bne     t2,x0,BS2
        addi    t1,t1,-1
        bne     t1,x0,BS1
        jalr    x0,0(ra)


; Place a single 8xN glyph onto the screen.
;
; A0 = unused; preserved.
; A1 = address of character glyph data
; A2 = width of font bitmap in bytes
; A3 = address on screen bitmap to put it
; A4 = width of screen bitmap in bytes
; A5 = #rows to copy
;
; A1, A3, A5, A6 destroyed.

GlyphPlot:
        lb      a6,0(a1)
        sb      a6,0(a3)
        add     a1,a1,a2
        add     a3,a3,a4
        addi    a5,a5,-1
        bne     a5,x0,GlyphPlot
        jalr    x0,0(ra)

; Double-height version of GlyphPlot, making it easier to read on high-res
; displays.  Same parameters, same registers destroyed.
;
; Note that this is different from calling GlyphPlot with twice as many rows
; to copy.  If you have a real 8x16 font, use GlyphPlot with 16 rows to copy,
; and not GlyphPlot2H (the latter would result in a 32px tall glyph).
;
; Typically used for things like headings, window titles, etc.

GlyphPlot2H:
        lb      a6,0(a1)
        sb      a6,0(a3)
        add     a1,a1,a2
        add     a3,a3,a4
        sb      a6,0(a3)
        add     a3,a3,a4
        addi    a5,a5,-1
        bne     a5,x0,GlyphPlot2H
        jalr    x0,0(ra)

; Place a single character onto the screen.
;
; A0 = character code
; A1 = address of font bitmap
; A2 = width of font bitmap in bytes
; A3 = address of screen bitmap where to put it
; A4 = width of screen bitmap in bytes
; A5 = #rows to copy
;
; A0, A1, A3, A5, A6 destroyed.

CharPlot:
        addi    sp,sp,-16
        sd      ra,0(sp)

        slli    a0,a0,3                 ; calculate addr of glyph to show
        add     a1,a1,a0

        addi    a5,x0,8                 ; 2^3 rows to copy

        jal     ra,GlyphPlot2H          ; Everything else is right

        ld      ra,0(sp)
        addi    sp,sp,16
        jalr    x0,0(ra)

; Place a whole string onto the screen.
;
; A0 = string pointer
; A1 = address of font bitmap
; A2 = width of font bitmap in bytes
; A3 = address of screen bitmap where to put it
; A4 = width of screen bitmap in bytes
; A5 = unused; preserved
; A6 = # of characters in string to place.
;
; A0, A1, A3, A5, A6 destroyed.

StringPlot:
        addi    sp,sp,-48
        sd      ra,0(sp)
        sd      s1,8(sp)
        sd      s3,16(sp)
        sd      s5,24(sp)
        sd      s6,32(sp)
        sd      s0,40(sp)

SP1:    addi    s0,a0,0
        addi    s1,a1,0
        addi    s3,a3,0
        addi    s5,a5,0
        addi    s6,a6,0

        lb      a0,0(a0)
        jal     ra,CharPlot

        addi    a0,s0,1
        addi    a1,s1,0
        addi    a3,s3,1
        addi    a5,s5,0
        addi    a6,s6,-1
        bne     a6,x0,SP1

        ld      s0,40(sp)
        ld      s6,32(sp)
        ld      s5,24(sp)
        ld      s3,16(sp)
        ld      s1,8(sp)
        ld      ra,0(sp)
        addi    sp,sp,48
        jalr    x0,0(ra)

; ================================================================
; Read-only data tends to be addressed at negative offsets of the _text_end
; symbol.  Start with dword-aligned data, then word-aligned data, then hword-
; aligned, and finish with byte-aligned data to minimize gaps.
;
; This affords the developer around 2KiB of read-only data accessible from
; the GP register.  More data can exist, but will require more work to
; access.

; Here sits our "font", a set of glyphs one byte wide and 96 rows tall.
; The bits in the bitmap might appear to be backwards; this is because
; the display system is little-endian (meaning, bit 0, the _least_ most
; significant bit, is displayed on the left, while bit 7 is on the right).

g_fontTab:
                byte    $3C     ; 0
                byte    $66
                byte    $76
                byte    $66
                byte    $6E
                byte    $66
                byte    $3C
                byte    $00

                byte    $18
                byte    $1C
                byte    $18
                byte    $18
                byte    $18
                byte    $18
                byte    $3C
                byte    $00

                byte    $3C
                byte    $66
                byte    $60
                byte    $30
                byte    $18
                byte    $0C
                byte    $7E
                byte    $00

                byte    $3C
                byte    $66
                byte    $60
                byte    $38
                byte    $60
                byte    $66
                byte    $3C
                byte    $00

                byte    $30
                byte    $38
                byte    $3C
                byte    $36
                byte    $7E
                byte    $30
                byte    $30
                byte    $00

                byte    $5E
                byte    $06
                byte    $5E
                byte    $60
                byte    $60
                byte    $66
                byte    $3C
                byte    $00

                byte    $3C
                byte    $66
                byte    $06
                byte    $3E
                byte    $66
                byte    $66
                byte    $3C
                byte    $00

                byte    $7E
                byte    $60
                byte    $30
                byte    $30
                byte    $18
                byte    $18
                byte    $18
                byte    $00

                byte    $3C
                byte    $66
                byte    $66
                byte    $3C
                byte    $66
                byte    $66
                byte    $3C
                byte    $00

                byte    $3C     ; 9
                byte    $66
                byte    $66
                byte    $7C
                byte    $60
                byte    $30
                byte    $1C
                byte    $00

                byte    $00     ; :
                byte    $18
                byte    $18
                byte    $00
                byte    $00
                byte    $18
                byte    $18
                byte    $00

                byte    $00     ; .
                byte    $00
                byte    $00
                byte    $00
                byte    $00
                byte    $18
                byte    $18
                byte    $00

; field displacement table for calculating new BCD value of the timer.
; Each byte indicates the relative displacement to the next BCD digit.

g_tmrDelta:     byte    -2              ; g_tmrT, skip over period
                byte    -1              ; g_tmrSL
                byte    -2              ; g_tmrSH
                byte    -1              ; g_tmrML
                byte    -2              ; g_tmrMH
                byte    -1              ; g_tmrHL
                byte    0               ; g_tmrHH

g_tmrBCDLimits: byte    10              ; g_tmrT
                byte    10              ; g_tmrSL
                byte    6               ; g_tmrSH
                byte    10              ; g_tmrML
                byte    6               ; g_tmrMH
                byte    10              ; g_tmrHL
                byte    10              ; g_tmrHH

                
; ================================================================
; Read-write data tends to be addressed at positive offsets of the _text_end
; symbol.  Start with dword-aligned data, then word-aligned data, then hword-
; aligned, and finish with byte-aligned data to minimize gaps.
;
; This affords the developer around 2KiB of read-write data.
; Combined with read-only data, that amounts to about 4KiB of space for
; global variables.

        align   4                       ; MUST be 8-byte aligned, not 4-byte!

_text_end:
_data:
        auipc   gp,0                    ; Load GP with address of _data
        jalr    x0,0(ra)

                align   8

g_scrBase:      dword   0
g_tmrHandle:    dword   0
g_eventMask:    dword   0


g_scrWidth:     hword   0
g_scrHeight:    hword   0

g_tmrHH:        byte    0               ; hours high nybble
g_tmrHL:        byte    0               ; hours low nybble
                byte    $0A
g_tmrMH:        byte    0               ; minutes high nybble
g_tmrML:        byte    0               ; minutes low nybble
                byte    $0A
g_tmrSH:        byte    0               ; seconds high nybble
g_tmrSL:        byte    0               ; seconds low nybble
                byte    $0B
g_tmrT:         byte    0               ; tenths of a second nybble
                byte    0               ; alignment padding

                align   8               ; total size of data MUST be 8-byte aligned!
_data_end:

textsize = _text_end - _text
datasize = _data_end - _data