The KCP53000 and KCP53020 ALU consists of a number of functional units wired in parallel. Each functional unit is implemented in a straight-forward manner; little to no clever optimization has been performed. While this limits the operating speed of the ALU, it guarantees the ALU can be understood by even a novice hardware engineer.

The following data inputs and outputs exist:

All of these inputs are routed to each functional unit, as shown below:

inA o-----*-------*-------*---------*---------*---------*--------.
inB o---*-|-----*-|-----*-|-------*-|-------*-|-------*-|------. |
Cin o-*-|-|-----|-|-----|-|-----*-|-|-----. | |       | |      | |
      | | |     | |     | |     | | |     | | |       | |      | |
    .-------. .-----. .-----. .-------. .-------. .-------. .-------.
    | Adder | | AND | | XOR | |  SHL  | |  SHR  | |  SLT  | |  SLTU |
    `-------' `-----' `-----' `-------' `-------' `-------' `-------'
        |        |       |        |         |         |          |
sum o-. |        |       |        |         |         |          |
and o-| |------. |       |        |         |         |          |
xor o-| |------| |-----. |        |         |         |          |
lsh o-| |------| |-----| |------. |         |         |          |
rsh o-| |------| |-----| |------| |-------. |         |          |
lts o-| |------| |-----| |------| |-------| |-------. |          |
ltu o-| |------| |-----| |------| |-------| |-------| |--------. |
      | |      | |     | |      | |       | |       | |        | |
      | |      | |     | |      | |       | |       | |        | |
    .-------.-------.-------.---------.---------.---------.---------.
    |  AND  |  AND  |  AND  |   AND   |   AND   |   AND   |   AND   |
    +---|---'---|---'---|---'----|----'----|----'----|----'----|----+ 
    |                               OR                              |
    `---------------------------------------------------------------'
                                     |
                                     V
                                    OUT

To select which function (OUT, Carry) = F(A, B, Carry-In) to apply, we need a number of control inputs.

Typically, only one of these control inputs will be set at a time. However, under certain circumstances, multiple enables can be asserted to achieve a synthetic function. For example, if you wished to produce the bitwise-OR operation, you would turn both and_en and xor_en on at the same time. This works because the final output (OUT) is synthesized by bitwise-ORing the outputs of all functional units. (This more-or-less faithfully emulates the NMOS 6502’s open-drain “wire-OR” circuit behavior.) Although not shown in the diagram above, the final output flags (C, V, and Z) are generated, in part, based on the value of OUT.

So, how does asserting and_en and xor_en yield a bitwise OR operation? Consider the truth tables of the AND and XOR functions:

.---.---.-----.
| A | B | Out |  AND
+---+---+-----+
| 0 | 0 |  0  |
| 0 | 1 |  0  |
| 1 | 0 |  0  |
| 1 | 1 |  1  |
`---+---+-----'
.---.---.-----.
| A | B | Out |  XOR
+---+---+-----+
| 0 | 0 |  0  |
| 0 | 1 |  1  |
| 1 | 0 |  1  |
| 1 | 1 |  0  |
`---+---+-----'

Notice how the OR function is a safe overlap of both the AND and XOR operations; thus, if you bitwise OR the results of an AND and a XOR operation on the same inputs, that’s the same function as just bitwise ORing the original operands.

NOTE: Why not just route the inputs to the final OR block at the bottom? I could have done that, I suppose. But, that would have just added more logic to the circuit without any gain in performance.

Another example of a synthetic function is subtraction. Similar to the MOS 6502, I implement subtraction by asserting invB_en as well as sum_en. This computes the function F=A+(~B)+Cin = A-B-1+Cin. Thus, to eliminate the -1 bias on the results, the instruction decoder makes sure to assert Cin as well. This yields the result F=A-B-1+1, or F=A-B.

NOTE: RISC-V does not make use of ALU flags like older 8-/16-bit CPUs did. There is no “Add With Carry” instruction. To accomplish a similar operation, you can make use of the SLTU to detect operand wrap-around, like so:

add256bits:
        ld   t0,0(a0)
        ld   t1,0(a1)
        add  t0,t0,t1
        sd   t0,0(a2)
        sltu t2,t0,t1  ; T2=1 iff t0 < t1, indicating a carry.

        ld   t0,8(a0)
        ld   t1,8(a1)
        add  t0,t0,t1
        add  t0,t0,t2  ; add carry in here.
        sd   t0,8(a2)
        sltu t2,t0,t1

        ld   t0,16(a0)
        ld   t1,16(a1)
        add  t0,t0,t1
        add  t0,t0,t2
        sd   t0,16(a2)
        sltu t2,t0,t1

        ld   t0,24(a0)
        ld   t1,24(a1)
        add  t0,t0,t1
        add  t0,t0,t2
        sd   t0,24(a2)

        jalr x0,0(ra)

The Carry input is also used to select between arithmetic and logical shifts in the shifters, since this input is not otherwise used for anything else.

Generally speaking, any instruction which touches memory in some way will set the ALU to add, since it’s computing the sum of a fixed offset from a base register. E.g., ld t0,16(a0). For computational instructions, there’s usually a one-to-one correspondence between instruction and enable, but there are exceptions:

Instruction Enables
add sum_en
sub sum_en, invB_en, Cin
and and_en
or and_en, xor_en
xor xor_en
sla lsh_en, Cin
sll lsh_en
sra rsh_en, Cin
srl rsh_en
slt lts_en
sltu ltu_en