[Open-graphics] Designing a CPU

Nicholas S-A nova at macintoshclub.com
Sun Mar 18 16:51:22 EDT 2007


> The ideas are
>
>   * Move cheap unary operators from stage 3 to stage 2, e.g. istead of
>     implementing {add, sub} in 3, implement only {add} and optional
>     negation on one of the operands.  The motivation is that parallel
>     functions in the ALU sits idle while one is working, and by moving
>     some computation to the previous stage we can instead exploit more
>     combinations.
>

The other advantage of this is that the ALU latency is reduced for  
single
operations (e.g. add in this case), so with some tricks the ALU can  
be synthesized
faster.

>   * Let the ALU run independent of whether we do arithemetic or memory
>     operations.  In case of memory operations, the output of the  
> ALU is
>     the address, thus giving us more powerful addressing "for free".
>     For store, the second operand is the stored value so we need to
>     weaken the addressing by forcing immediate mode for the second
>     argument to the ALU.
>

Are we going to have some sort of parallel ALU arch? If so, we could  
have
one ALU dedicated to addressing for memory operations. We can compute
address offsets using a simpler ALU, and only allow a subset for that  
one for
normal operation. This allows for commands like:
STORE ax + b
FETCH rega*regb*memc (since the multiplier is FPGA hardware)
STORE (a-b)*c
and such in ONE cycle! (maybe not the second one)

>   * Only basic one-cycle operations goes in the CPU.  If we need any
>     multi-cycle operation (div?), we could add it as IO ports, using
>     the "modularised RISC" idea,
>     http://lists.duskglow.com/open-graphics/2006-April/005307.html.
>
> Below is the sketch.  I hope you can read the Unicode.  Some
> conventions:
>
>   * I have taken the liberty to misuse logical operators as bitwise
>     logical operators.
>
>   * Conditional bitwise not:
>         (¬)^s x = | x     if s = 0
>                   | ¬ x   if s = 1

To clarify for people who find this confusing, I think that this
is raising the not operation to the sth power (so if s is zero, this
just doesn't change x). Right?

>
>  1. Instruction Fetch
>
>     Fetch one instruction of the form (qmode, qop, ix, sx, mx, iy,  
> sy, iz, c)
>     where
>
>         qmode   2 bits  0 = ARITH, 1 = FETCH, 2 = STORE, 3 = BRANCH
>         qop     3 bits  ALU function if qmode ≠ BRANCH, else  
> branch condition
>         ix      5 bits  first operand register index
>         sx      1 bit   first operand bitwise not option
>         mx      1 bit   first operand increment by 1 option
>         iy      5 bits  second operand register index
>         sy      1 bit   second operand bitwise not option
>         iz      5 bits  write-back register
>         c       9 bits  immediate value
>
>  2. Register Fetch, Operand Computation, and Branch Condition
>
>   2a. Operand Computation.
>
>     Here we compute
>
>         x := (¬)^sx r_ix + mx
>
>         y := | (¬)^sy ⌊2^c r_iy⌋  if iy ≠ 0 and qmode ≠ STORE
>              | c                  if iy = 0  or qmode = STORE

I am confused about this:
> ⌊2^c r_iy⌋
Is that the floor operation?
This just uses the immediate as a bit shift, right?

That is pretty clever if it is what I think it is.

>
>     If we expand the first on in terms of (sx, mx) we get the 4
>     operations:
>
>         x := | r_ix       if sx = 0, mx = 0
>              | ¬ r_ix     if sx = 1, mx = 0
>              | r_ix + 1   if sx = 0, mx = 1
>              | - r_ix     if sx = 1, mx = 1
>
>     I assume splitting up negation into bitwise-not and increment does
>     not add significantly to the complexity.  If that is not the case,
>     we should drop the mx from the instruction and simplify x to
>
>         x := (-1)^sx r_ix
>
>     Register r_0 is hardcoded to 0 for reads.

This might cause some issues. Based on the way that y is implemented,
one cannot have it be based on r_0 (since iy=0 signals immediate, I  
think).
So, if you want to not input, then ix = 0 and iy = register number,  
with c=0 and
sy = 1. If you want to add one, you use the ALU add operation.

Ok, I guess it doesn't cause a problem. never mind ;-)

>
>   2b. Branch Condition.
>
>         do_branch := | false    if qmode ≠ BRANCH
>                      | x = 0    if qmode = BRANCH and qop_0 = 0
>                      | x ≥ 0    if qmode = BRANCH and qop_0 = 1
>
>     The x ≥ 0 test is cheap (check upper bit).  If the x = 0 test  
> is too
>     expensive here, then we could add an equality comparison to the  
> ALU
>     which returns -1 for true and 0 for false.
>

That looks good.

>  3. ALU
>
>         z := | x + y    if qop = ADDSUB
>              | x ⋅ y    if qop = MULT
>              | x ∧ y    if qop = AND
>              | x ∨ y    if qop = OR
>              | x ⊻ y    if qop = XOR
>              | ⌊2^y x⌋  if qop = SHIFT
> 	     | ...
>
>     Note that the choice of x and y in SHIFT gives the maximum  
> flexibility
>     by not duplicating the stage 2 shift operation.
>

I like that!

>  4. Memory and IO
>
>     Optionally fetch from or store to local memory location z:
>
>         z' := z                 if qmode = ARITH or BRANCH
>         z' := M_z               if qmode = FETCH
>         M_z := r_iy             if qmode = STORE
>

so, with my ALU idea if qmode = STORE, M_z := z2. I don't know if  
that is
possible - if not, it should at least run through the stage 2 +1  
logic (by switching
  the qmode=STORE test from y to x in stage 2).
If this doesn't make much sense, ignore it.
We only have 512 instructions, so this is a matter of code
compression and speed. Since the qmode = STORE instructions are going
to pass down the pipeline like any other instruction, we might as  
well use the
logic.

>  5. Write-Back
>
>         r_iz := z'  if iz ≠ 0 and qmode ≠ 11
>
>     We could add a bitwise-not here, and drop OR from the ALU, but I'm
>     running out of instruction bits.

Yeah, either way would work.

I like the design!



More information about the Open-graphics mailing list