# Sketch of Next Gen Stack CPU ISA

A. Instruction Reference

The instructions documented below assume a 64-bit cell width.  However, the ISA
is designed to be extensible, ranging from 16-bits (minimum width supported) to
up to 1024 bits wide.  Note that some CPUs may choose to re-allocate
opcodes intended for unsupported widths for other purposes.  These opcodes are
machine-specific, and not guaranteed to be upward compatible with future
revisions of the ISA, however.

A.1.  Group 0 Instructions

Instructions in group 0 are hard to categorize elsewhere.

A.1.1.  BRK ( -- )

	IF COND THEN
		Trap(BREAKPOINT)
	END

Performs a breakpoint trap.

A.1.2.  SC ( ... -- ... )

	IF COND THEN
		Trap(SYSCALL)
	END

Performs a system call trap.  Generally speaking, a service number is placed
onto the top of the data stack, indicating which service the operating system
is to perform.  The input and output stack effects are, much like any
subroutine, defined by the service performed.

A.1.3.  POP ( -- x )  (R: x -- )

Move a cell from the return stack to the data stack.

A.1.4.  PUSH ( x -- )  (R: -- x )

Move a cell from the data stack to the return stack.

A.1.5.  JMPDI ( a -- )

	address := POP(D)
	IF COND=TRUE THEN
		PC := address
	END

Jump to the absolute address on the data stack.  This instruction may jump
conditionally if prefixed with the COND prefix.

A.1.6.  CALLDI ( a -- ) (R: -- pc+1 )

	IF COND=TRUE THEN
		address := POP(D)
		PUSH(R, PC+1)
		PC := address
	END

Call the subroutine whose address is on the data stack.  This instruction may
jump conditionally if prefixed with the COND prefix.

A.1.7.  RET (aka JMPRI) ( -- ) (R: a -- )

	address := POP(R)
	IF COND=TRUE THEN
		PC := address
	END

Return from the current subroutine by jumping to the address at the top of the
return stack.  This instruction may jump conditionally if prefixed with the
COND prefix.

A.1.8.  SWITCH (aka CALLRI) ( -- ) (R: a -- pc+1 )

	IF COND=TRUE THEN
		address := POP(R)
		PUSH(R, PC+1)
		PC := address
	END


Switch co-routines by swapping the next instruction's address and the address
at the top of the return stack.  This instruction may jump conditionally if
prefixed with the COND prefix.

A.1.9.  CR! ( x1 x2 -- )

Stores x1 into control register x2.  The side-effects this has is control
register dependent.

NOTE: On some processor variants and/or control registers, this instruction may
trap for emulation in software.

A.1.10.  CR@ ( x -- x )

Reads a control register's current value and places it onto the data stack.
This might have side-effects; refer to the control register's documentation for
more details.

NOTE: On some processor variants and/or control registers, this instruction may
trap for emulation in software.

A.1.11.  DI ( -- )

Disable interrupts.  This is typically a faster and more atomic shortcut for
regNum CR@ mask BIC regNum CR! .

NOTE: On some processor variants, this instruction may trap for emulation in
software.  Some processor variants may not support interrupts.

A.1.12.  EI ( -- )

Enable interrupts.  This is typically a faster and more atomic shortcut for
regNum CR@ mask OR regNum CR! .

NOTE: On some processor variants, this instruction may trap for emulation in
software.  Some processor variants may not support interrupts.

A.1.13.  SEC ( -- )

The COND instruction typically performs its comparisons against the top of the
data stack (T).  This prefix alters COND so that it works with the second top
of stack (S).  It has no other effects.

A.1.14.  R@ ( -- x ) (R: x -- x )

Fetches the current top of the return stack and places it onto the data stack.
It does NOT pop the return stack.  This is a faster and more atomic equivalent
of POP DUP PUSH.

A.2.  Group 1 Instructions

Instructions in group 1 push a signed or unsigned literal onto the data stack.

	.-----------+---+---------------.
	|    siz    | S | 0   0   0   1 |
	`-----------+---+---------------'

The siz-bits indicates the size of the datum to push onto the stack.  The S bit
is true if the value is to be sign-extended; false for zero-extended.

A minimum of 8- and 16-bit quantities must be supported.

A.3.  BOOL2 ( x1 x2 -- x ) and BOOL2 ( x1 x2 -- x1 x2 x )

Instructions in groups 2 and 3 comprise the BOOL2 instruction.  These two
groups differ in whether the input parameters are first popped off the stack
(group 3) or not (group 2).

The BOOL2 instruction computes a boolean function given two parameters from the
data stack. The upper four bits of the opcode forms a look-up table which
determines the operation to perform.

	.---------------+-----------+---.
	| a   b   c   d | 0   0   1 | D |
	`---------------+-----------+---'

Given two bits (one each from x1 and x2), use abcd above to calculate
the result according to this table:

	x1 x2 || r	AND	NAND	OR	NOR	XOR	XNOR	BIC
	-----------
	 0  0    a	0	1	0	1	0	1	0
	 0  1    b	0	1	1	0	1	0	0
	 1  0    c	0	1	1	0	1	0	1
	 1  1    d	1	0	1	0	0	1	0

You can also use BOOL2 to push zero and negative-1 constants onto the stack
more quickly and compactly than you can with any of the LIT instructions.
Setting abcd=0000 pushes zero, while setting abcd=1111 pushes negative one.

Many stack manipulation operations are implemented using BOOL2 as well.

	DROP	NIP	OVER	DUP
	D=1	D=1	D=0	D=0

	0	0	0	0
	0	1	0	1
	1	0	1	0
	1	1	1	1

Some common operations are encoded as follows:

	00000010	0			00000011	2DROP 0
	00010010	2DUP AND		00010011	AND
	00100010	2DUP BIC		00100011	BIC
	00110010	OVER			00110011	DROP
	01000010	2DUP SWAP BIC		01000011	SWAP BIC
	01010010	DUP			01010011	NIP
	01100010	2DUP XOR		01100011	XOR
	01110010	2DUP OR			01110011	OR
	10000010	2DUP NOR		10000011	NOR
	10010010	2DUP XNOR		10010011	XNOR
	10100010	DUP INVERT		10100011	INVERT NIP
	10110010	2DUP INVERT OR		10110011	INVERT OR
	11000010	OVER INVERT		11000011	DROP INVERT
	11010010	2DUP SWAP INVERT OR	11010011	SWAP INVERT OR
	11100010	2DUP NAND		11100011	NAND
	11110010	-1			11110011	2DROP -1

A.4.  BOOL1 ( x1 -- x ) and BOOL1 ( x1 -- x1 x )

Instructions in groups 4 and 5 comprise the BOOL1 instruction.  These two
groups differ in whether the input parameter is first popped off the stack
(group 5) or not (group 4).

The BOOL1 instruction computes a boolean function given one parameter from the
data stack. The upper four bits of the opcode forms a look-up table which
determines the operation to perform.

	.---------------+-----------+---.
	| 0 . 0 . c . d | 0 . 1 . 0 | D |
	`---------------+-----------+---'

For each bit in x1, use cd above to calculate the result according to
this table:

	x1 || r		ZERO	NEG1	INVERT	NOP
	-------
	 0    c		0	1	1	0
	 1    d		0	1	0	1

You can also use BOOL1 to push zero and negative-1 constants onto the stack
more quickly and compactly than you can with any of the LIT instructions.
Setting cd=00 pushes zero, while setting cd=11 pushes negative one.  While this
overlaps with BOOL2 instruction encodings, it's useful to have these
instructions for those cases where you only want to encode DROP 0 or DROP -1
instead of 2DROP 0 or 2DROP -1.

Some common operations are encoded as follows:

	00000100	0			00000101	DROP 0
	00010100	DUP 			00010101	NOP
	00100100	DUP INVERT		00100101	INVERT
	00110100	-1			00110101	DROP -1

A.4.  ADDSUB2 ( x1 x2 -- x ) and ADDSUB2 ( x1 x2 -- x1 x2 x )

Instructions in groups 6 and 7 comprise the ADDSUB2 instruction.  These two
groups differ in whether the input parameter is first popped off the stack
(group 7) or not (group 6).

The ADDSUB2 instruction computes a 2's-compliment sum given two parameters from
the data stack. The upper bits of the opcode controls the precise data path
through the ALU to calculate this sum.  The result can be used for addition or
subtraction, depending on configuration.

	.---+---+-------+-----------+---.
	| 0 | b |  cc   | 0   1   1 | D |
	`---+---+-------+-----------+---'

The b bit inverts the x2 operand if set; otherwise, it leaves the value
unchanged.  The cc field offers input carry control:

	00	Ignore carry flag; assume carry is clear.
	01	Ignore carry flag; assume carry is set.
	10	Use carry flag as-is.
	11	Use inverted carry flag.

Note that this instruction always updates carry.

Some common operations are encoded as follows:

 00000110	2DUP ADD	00000111	ADD
 00010110	2DUP ADD 1+	00010111	ADD 1+
 00100110	2DUP ADC	00100111	ADC
 00110110	2DUP ADC 1-	00110111	ADC 1-
 01000110	2DUP SUB 1-	01000111	SUB 1-
 01010110	2DUP SUB	01010111	SUB
 01100110	2DUP SBC	01100111	SBC
 01110110	2DUP SBC 1-	01110111	SBC 1-

A.5.  COND ( x2 -- ) and SEC COND ( x1 x2 -- x2 )

	IF not prefixed with SEC THEN
		value := POP(D)
	ELSE
		value := POP(S)
	END
	COND := ((value < 0) & a) | ((value = 0) & b) | (carry & c) == y

The COND prefix alters the behavior of a subsequent control flow instruction,
such as the instructions in the JUMPI or BRANCHES groups.  The COND prefix has
no effect on instructions which do not transfer control.

	.---+---+---+---+-----------+---.
	| 0 | a | b | c | 1   0   0 | y |
	`---+---+---+---+-----------+---'

Assuming the c bit is set to 0, the encoding of a, b, and y can give the
following checks:

		Signed		Unsigned
	a b y	Check		Check
	---------------------------------
	0 0 0	always		always
	0 0 1	never		never
	0 1 0	value != 0	value > 0
	0 1 1	value = 0	value = 0
	1 0 0	value >= 0	
	1 0 1	value < 0
	1 1 0	value > 0
	1 1 1	value <= 0

With the c bit set, there is an additional check on the carry flag.

SEC is a prefix that modifies the COND prefix to work with the second top of
stack instead of the direct top of stack.  This is most often used for
emulating S16X4A control flow instructions.

A.6.  Direct Jumps and Calls

Instructions in group 10 are responsible for direct transfer of program
control.  Without the COND prefix, the control flow transfers are
unconditional; with the COND prefix, they are conditional.

	.---+---+---+---+---------------.
	|    sss    | C | 1   0   1   0 |
	`---+---+---+---+---------------'

The size (sss) field indicates how big the displacement to the program counter
is.  The C bit is set for a subroutine call, clear for a simple jump.

A.7.  Shifts and Rotations

The instructions in group 11 perform bitwise rotations and shifts.

	.---+-----------+---------------.
	| 0 |    fff    | 1   0   1   1 |
	`---+-----------+---------------'

The function (fff) field selects the precise operation to perform, according to
the opcode encodings below:

 00001011	LSL     ( n cnt -- n' )
 00011011	LSR     ( n cnt -- n' )
 00101011	ASR     ( n cnt -- n' )
 00111011	PERMUTE	( n idx -- n' )
 01001011	RL      ( n cnt -- n' )
 01011011	RLC     ( n cnt -- n' )
 01101011	RR      ( n cnt -- n' )
 01111011	RRC     ( n cnt -- n' )

The LSL and LSR operations perform logical shifts (left and right,
respectively).  ASR also performs a right shift, but does so arithmetically.
RL and RR perform rotations left and right without cycling through the carry
flag.  RLC and RRC do so including the carry flag as an additional bit.  For
example, if we execute the instructions SLIT8 $88 ULIT8 $01, then the following
instructions will produce the following results:

	LSL

         N-1      7   6   5   4   3   2   1   0       C
	.---+-/-+---+---+---+---+---+---+---+---.   .---.
	| 1 |...| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |   | 1 |
	`---+-/-+---+---+---+---+---+---+---+---'   `---'
          |                                           ^
	  |                                           |
	  `-------------------------------------------'

	LSR

         N-1      7   6   5   4   3   2   1   0       C
	.---+-/-+---+---+---+---+---+---+---+---.   .---.
	| 0 |...| 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |-->| 0 |
	`---+-/-+---+---+---+---+---+---+---+---'   `---'

	ASR

         N-1      7   6   5   4   3   2   1   0       C
	.---+-/-+---+---+---+---+---+---+---+---.   .---.
	| 1 |...| 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |-->| 0 |
	`---+-/-+---+---+---+---+---+---+---+---'   `---'

	RL

         N-1      7   6   5   4   3   2   1   0       C
	.---+-/-+---+---+---+---+---+---+---+---.   .---.
	| 1 |...| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |   | 1 |
	`---+-/-+---+---+---+---+---+---+---+---'   `---'
          |                                   ^       ^
	  |                                   |       |
	  `-----------------------------------'-------'

	RLC

         N-1      7   6   5   4   3   2   1   0       C
	.---+-/-+---+---+---+---+---+---+---+---.   .---.
	| 1 |...| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |<--| 1 |
	`---+-/-+---+---+---+---+---+---+---+---'   `---'
          |                                           ^
	  |                                           |
	  `-------------------------------------------'

	RR

         N-1      7   6   5   4   3   2   1   0       C
	.---+-/-+---+---+---+---+---+---+---+---.   .---.
	| 0 |...| 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |-->| 0 |
	`---+-/-+---+---+---+---+---+---+---+---'   `---'
	  ^                                   |
	  |                                   |
	  `-----------------------------------'

	RRC

         N-1      7   6   5   4   3   2   1   0       C
	.---+-/-+---+---+---+---+---+---+---+---.   .---.
	| 0 |...| 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |-->| 0 |
	`---+-/-+---+---+---+---+---+---+---+---'   `---'
	  ^                                           |
	  |                                           |
	  `-------------------------------------------'

TODO: Look into ways of unifying rotations and shifts using option bits.  It
might not be possible due to having too few bits available; but, if it can be
done, we should use that approach instead of function decodes.

The PERMUTE instruction is useful for re-arranging the bytes within a
multi-byte cell.  You have direct control of which nybbles in an input cell
land in an output cell.  If multiple source nybbles are routed to the same
destination nybble, they are logically-ORed.

For example, on a 32-bit processor, a do-nothing permutation would look like
this: ULIT32 $12345678 ULIT32 $76543210 PERMUTE.  To reverse the bytes: ULIT32
$12345678 ULIT32 $10325476 PERMUTE.

This instruction is quite useful for implementing conversions to/from
big-endian representation.

A.8. PC-Relative Effective Addresses

The group 12 set of instructions are used to push PC-relative addresses onto
the stack for subsequent access by loads and stores (group 13 and 14
instructions).

	.---+---+---+---+---------------.
	|    sss    | S | 1   1   0   0 |
	`---+---+---+---+---------------'

The size (sss) field indicates how big the PC-relative displacement is (8 bits,
16 bits, etc.).  The S bit indicates which stack to push the effective address
onto; if clear, the return stack.  If set, the data stack.


A.9. Stores and Loads

The group 13 instructions allows data to be stored into memory.  Group 14
instructions can retrieve this data back via a set of signed and unsigned
loads.

Note that loads and stores may cause a trap.  Some CPUs may offer memory
protection.  Others require loads and stores to occur only on naturally aligned
fields in memory.  Etc.

A.9.1.  Stores

	.---+---+---+-------------------.
	|    sss    | 0   1   1   0   1 |
	`---+---+---+-------------------'

The size field (sss) indicates the size of the data to store into memory.  As
indicated below, the data stored into memory comes from the lowest set of bits.

         6               3       1
         3               1       5   7 0
	.---------------------------+---.
	|///////////////////////////|   |  sss=000  Byte
	+-----------------------+---+---+
	|///////////////////////|       |  sss=001  Half-word
	+---------------+-------+-------+
	|///////////////|               |  sss=010  Word
	+---------------+---------------+
	|                               |  sss=011  Double-word
	`-------------------------------'

A.9.2.  Loads

	.---+---+---+---+---------------.
	|    sss    | S | 1   1   1   0 |
	`---+---+---+---+---------------'

The size field (sss) indicates the size of the data to load from memory.  The S
bit indicates if the data retrieved is interpreted as an unsigned (0;
zero-extended) or signed (1; sign-extended) quantity.  A load always affects
the full cell width in the data stack.

         6               3       1
         3               1       5   7 0
	.---------------------------+---.
	|///////////////////////////|   |  sss=000  Byte
	+-----------------------+---+---+
	|///////////////////////|       |  sss=001  Half-word
	+---------------+-------+-------+
	|///////////////|               |  sss=010  Word
	+---------------+---------------+
	|                               |  sss=011  Double-word
	`-------------------------------'

	.---.
	|///| Sign-extension or zero-extension, depending on S bit.
	`---'

B.  S16X4(A) to ISA/NG Migration

Just as the 8086 was intended to be source-code compatible with the 8008 and
8080, so too is ISA/NG intended to be as source-code compatible with the
S16X4(A) MISC processors as possible.  However, they are not binary compatible.

Here's the complete mapping of instruction sequences from the S16X4 to the
ISA/NG.

	S16X4A		ISA/NG
	======		======
	NOP		NOP
	LIT		ULIT16 or PEAD16 depending on context
	FWM		ULD16
	SWM		UST16
	ADD		ADD
	AND		AND
	XOR		XOR
	LIT/ZGO		COND(T=0)/JMP  or  ULIT16/SEC/COND/JMPDI
	ZGO		SEC/COND/JMPDI
	FBM		ULD8
	SBM		UST8
	LCALL		CALL
	ICALL		CALLDI  or  SWITCH (depending on context)
	LIT/NZGO	COND(T<>0)/JMP  or  ULIT16/SEC/COND/JMPDI
	NZGO		SEC/COND/JMPDI
	LIT/GO		JMP  or  ULIT16/JMPDI
	GO		JMPDI or RET (depending on context)

C.  Instruction Mapping

C.1.  Binary Encoding

00000000	BRK		EEPROM Patch Breakpoint
00010000	SC		System Call
001f0000	PUSHPOP		PUSH and POP instructions
01sc0000	JUMPI		Indirect Control Flow
100f0000	CRLDST  	Control Register accessors
101e0000	EIDI		Enable/Disable interrupts
11000000	SEC		Prefix for COND
11010000	R@		Return stack accessor
sssS0001 ...	LIT		Literal load group
ffff001d	BOOL2
00ff010d	BOOL1
0bcc011d	ADDSUB2
0abc100y	COND
sssc1010 ...	BRANCHES	Direct control flow
ffff1011	MULSHF2
sssy1100 ...	PEA		Support for PC-relative programs
sss01101	STORES
sssS1110	LOADS

Illegal Opcode Encodings (These will trap)
 111x0000
 01xx010x
 1xxx01xx
 1xxx1011
 xxx11101
 xxxx1111

JUMPI Group
 01000000	JMPDI	PC=T
 01010000	JSRDI	R=PC+1; PC=T
 01100000	RET	PC=R
 01110000	SWITCH	R=PC+1; PC=R

BRANCHES
 sss01010	JMP PC+ea
 sss11010	CALL PC+ea

BOOL2
 00000010	0
 00010010	2DUP AND
 00100010	2DUP BIC
 00110010	OVER
 01000010	2DUP SWAP BIC
 01010010	DUP
 01100010	2DUP XOR
 01110010	2DUP OR
 10000010	2DUP NOR
 10010010	2DUP XNOR
 10100010	DUP INVERT
 10110010	2DUP INVERT OR
 11000010	OVER INVERT
 11010010	2DUP SWAP INVERT OR
 11100010	2DUP NAND
 11110010	-1
 00000011	2DROP 0
 00010011	AND
 00100011	BIC
 00110011	DROP
 01000011	SWAP BIC
 01010011	NIP
 01100011	XOR
 01110011	OR
 10000011	NOR
 10010011	XNOR
 10100011	INVERT NIP
 10110011	INVERT OR
 11000011	DROP INVERT
 11010011	SWAP INVERT OR
 11100011	NAND
 11110011	2DROP -1

BOOL1
 00000100	0
 00010100	DUP
 00100100	DUP INVERT
 00110100	-1
 00000101	DROP 0
 00010101	NOP
 00100101	INVERT
 00110101	DROP -1

ADDSUB2
 00000110	2DUP ADD
 00010110	2DUP ADD 1+
 00100110	2DUP ADC
 00110110	2DUP ADC 1-
 01000110	2DUP SUB 1-
 01010110	2DUP SUB
 01100110	2DUP SBC
 01110110	2DUP SBC 1-
 00000111	ADD
 00010111	ADD 1+
 00100111	ADC
 00110111	ADC 1-
 01000111	SUB 1-
 01010111	SUB
 01100111	SBC
 01110111	SBC 1-

MULSHF2
 00001011	LSL
 00011011	LSR
 00101011	ASR
 00111011	PERMUTE
 01001011	RL
 01011011	RLC
 01101011	RR
 01111011	RRC
 1xxx1011	illegal

C.2.  Opcode Map

  0		1		2		3		4		5		6		7
0 BRK		ULIT8		BOOL2		BOOL2		BOOL1		BOOL1		ADDSUB2		ADDSUB2
1 SC		SLIT8		BOOL2		BOOL2		BOOL1		BOOL1		ADDSUB2		ADDSUB2
2 POP		ULIT16		BOOL2		BOOL2		BOOL1		BOOL1		ADDSUB2		ADDSUB2
3 PUSH		SLIT16		BOOL2		BOOL2		BOOL1		BOOL1		ADDSUB2		ADDSUB2
4 JMPDI	(JUMPI)	ULIT32		BOOL2		BOOL2		---		---		ADDSUB2		ADDSUB2
5 CALLDI(JUMPI)	SLIT32		BOOL2		BOOL2		---		---		ADDSUB2		ADDSUB2
6 RET	(JUMPI)	ULIT64		BOOL2		BOOL2		---		---		ADDSUB2		ADDSUB2
7 SWITCH(JUMPI)	SLIT64		BOOL2		BOOL2		---		---		ADDSUB2		ADDSUB2
8 CR!		---		BOOL2		BOOL2		---		---		---		---
9 CR@		---		BOOL2		BOOL2		---		---		---		---
A DI		---		BOOL2		BOOL2		---		---		---		---
B EI		---		BOOL2		BOOL2		---		---		---		---
C SEC[1]	---		BOOL2		BOOL2		---		---		---		---
D R@		---		BOOL2		BOOL2		---		---		---		---
E ---		---		BOOL2		BOOL2		---		---		---		---
F ---		---		BOOL2		BOOL2		---		---		---		---

  8		9		A		B		C		D		E		F
0 COND[2]	COND[2]		JMP8		LSL		PEAR8		ST8		ULD8		---
1 COND[2]	COND[2]		CALL8		LSR		PEAD8		---		SLD8		---
2 COND[2]	COND[2]		JMP16		ASR		PEAR16		ST16		ULD16		---
3 COND[2]	COND[2]		CALL16		PERMUTE		PEAD16		---		SLD16		---
4 COND[2]	COND[2]		JMP32		RL		PEAR32		ST32		ULD32		---
5 COND[2]	COND[2]		CALL32		RLC		PEAD32		---		SLD32		---
6 COND[2]	COND[2]		JMP64		RR		PEAR64		ST64		ULD64		---
7 COND[2]	COND[2]		CALL64		RRC		PEAD64		---		SLD64		---
8 ---		---		---		---		---		---		---		---
9 ---		---		---		---		---		---		---		---
A ---		---		---		---		---		---		---		---
B ---		---		---		---		---		---		---		---
C ---		---		---		---		---		---		---		---
D ---		---		---		---		---		---		---		---
E ---		---		---		---		---		---		---		---
F ---		---		---		---		---		---		---		---

[1] - Instruction Prefix.  Modifies behavior of COND.
[2] - Instruction Prefix.  Modifies behavior of control flow instructions.