The KCP53000 and KCP53020 ALU consists of a number of functional units wired in parallel. Each functional unit is implemented in a straight-forward manner; little to no clever optimization has been performed. While this limits the operating speed of the ALU, it guarantees the ALU can be understood by even a novice hardware engineer.
The following data inputs and outputs exist:
All of these inputs are routed to each functional unit, as shown below:
inA o-----*-------*-------*---------*---------*---------*--------.
inB o---*-|-----*-|-----*-|-------*-|-------*-|-------*-|------. |
Cin o-*-|-|-----|-|-----|-|-----*-|-|-----. | | | | | |
| | | | | | | | | | | | | | | | |
.-------. .-----. .-----. .-------. .-------. .-------. .-------.
| Adder | | AND | | XOR | | SHL | | SHR | | SLT | | SLTU |
`-------' `-----' `-----' `-------' `-------' `-------' `-------'
| | | | | | |
sum o-. | | | | | | |
and o-| |------. | | | | | |
xor o-| |------| |-----. | | | | |
lsh o-| |------| |-----| |------. | | | |
rsh o-| |------| |-----| |------| |-------. | | |
lts o-| |------| |-----| |------| |-------| |-------. | |
ltu o-| |------| |-----| |------| |-------| |-------| |--------. |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
.-------.-------.-------.---------.---------.---------.---------.
| AND | AND | AND | AND | AND | AND | AND |
+---|---'---|---'---|---'----|----'----|----'----|----'----|----+
| OR |
`---------------------------------------------------------------'
|
V
OUT
To select which function (OUT, Carry) = F(A, B, Carry-In) to apply, we need a number of control inputs.
sum_en
– Enable the adder.and_en
– Enable the bitwise AND generator.xor_en
– Enable the bitwise XOR generator.invB_en
– Logically invert the B input (0 <–> 1, all bits).lsh_en
– Enable left-shifter.rsh_en
– Enable right-shifter.ltu_en
– Enable “set less-than”, unsigned inputs.lts_en
– Enable “set less-than”, signed inputs.Typically, only one of these control inputs will be set at a time. However, under certain circumstances, multiple enables can be asserted to achieve a synthetic function. For example, if you wished to produce the bitwise-OR operation, you would turn both and_en
and xor_en
on at the same time. This works because the final output (OUT) is synthesized by bitwise-ORing the outputs of all functional units. (This more-or-less faithfully emulates the NMOS 6502’s open-drain “wire-OR” circuit behavior.) Although not shown in the diagram above, the final output flags (C, V, and Z) are generated, in part, based on the value of OUT.
So, how does asserting and_en
and xor_en
yield a bitwise OR operation? Consider the truth tables of the AND and XOR functions:
.---.---.-----.
| A | B | Out | AND
+---+---+-----+
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
`---+---+-----'
.---.---.-----.
| A | B | Out | XOR
+---+---+-----+
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
`---+---+-----'
Notice how the OR function is a safe overlap of both the AND and XOR operations; thus, if you bitwise OR the results of an AND and a XOR operation on the same inputs, that’s the same function as just bitwise ORing the original operands.
NOTE: Why not just route the inputs to the final OR block at the bottom? I could have done that, I suppose. But, that would have just added more logic to the circuit without any gain in performance.
Another example of a synthetic function is subtraction. Similar to the MOS 6502, I implement subtraction by asserting invB_en
as well as sum_en
. This computes the function F=A+(~B)+Cin = A-B-1+Cin. Thus, to eliminate the -1 bias on the results, the instruction decoder makes sure to assert Cin as well. This yields the result F=A-B-1+1, or F=A-B.
NOTE: RISC-V does not make use of ALU flags like older 8-/16-bit CPUs did. There is no “Add With Carry” instruction. To accomplish a similar operation, you can make use of the SLTU to detect operand wrap-around, like so:
add256bits:
ld t0,0(a0)
ld t1,0(a1)
add t0,t0,t1
sd t0,0(a2)
sltu t2,t0,t1 ; T2=1 iff t0 < t1, indicating a carry.
ld t0,8(a0)
ld t1,8(a1)
add t0,t0,t1
add t0,t0,t2 ; add carry in here.
sd t0,8(a2)
sltu t2,t0,t1
ld t0,16(a0)
ld t1,16(a1)
add t0,t0,t1
add t0,t0,t2
sd t0,16(a2)
sltu t2,t0,t1
ld t0,24(a0)
ld t1,24(a1)
add t0,t0,t1
add t0,t0,t2
sd t0,24(a2)
jalr x0,0(ra)
The Carry input is also used to select between arithmetic and logical shifts in the shifters, since this input is not otherwise used for anything else.
Generally speaking, any instruction which touches memory in some way will set the ALU to add, since it’s computing the sum of a fixed offset from a base register. E.g., ld t0,16(a0)
. For computational instructions, there’s usually a one-to-one correspondence between instruction and enable, but there are exceptions:
Instruction | Enables |
---|---|
add |
sum_en |
sub |
sum_en , invB_en , Cin |
and |
and_en |
or |
and_en , xor_en |
xor |
xor_en |
sla |
lsh_en , Cin |
sll |
lsh_en |
sra |
rsh_en , Cin |
srl |
rsh_en |
slt |
lts_en |
sltu |
ltu_en |