[Open-graphics] Designing a CPU
Nicholas S-A
nova at macintoshclub.com
Sun Mar 18 16:51:22 EDT 2007
> The ideas are
>
> * Move cheap unary operators from stage 3 to stage 2, e.g. istead of
> implementing {add, sub} in 3, implement only {add} and optional
> negation on one of the operands. The motivation is that parallel
> functions in the ALU sits idle while one is working, and by moving
> some computation to the previous stage we can instead exploit more
> combinations.
>
The other advantage of this is that the ALU latency is reduced for
single
operations (e.g. add in this case), so with some tricks the ALU can
be synthesized
faster.
> * Let the ALU run independent of whether we do arithemetic or memory
> operations. In case of memory operations, the output of the
> ALU is
> the address, thus giving us more powerful addressing "for free".
> For store, the second operand is the stored value so we need to
> weaken the addressing by forcing immediate mode for the second
> argument to the ALU.
>
Are we going to have some sort of parallel ALU arch? If so, we could
have
one ALU dedicated to addressing for memory operations. We can compute
address offsets using a simpler ALU, and only allow a subset for that
one for
normal operation. This allows for commands like:
STORE ax + b
FETCH rega*regb*memc (since the multiplier is FPGA hardware)
STORE (a-b)*c
and such in ONE cycle! (maybe not the second one)
> * Only basic one-cycle operations goes in the CPU. If we need any
> multi-cycle operation (div?), we could add it as IO ports, using
> the "modularised RISC" idea,
> http://lists.duskglow.com/open-graphics/2006-April/005307.html.
>
> Below is the sketch. I hope you can read the Unicode. Some
> conventions:
>
> * I have taken the liberty to misuse logical operators as bitwise
> logical operators.
>
> * Conditional bitwise not:
> (¬)^s x = | x if s = 0
> | ¬ x if s = 1
To clarify for people who find this confusing, I think that this
is raising the not operation to the sth power (so if s is zero, this
just doesn't change x). Right?
>
> 1. Instruction Fetch
>
> Fetch one instruction of the form (qmode, qop, ix, sx, mx, iy,
> sy, iz, c)
> where
>
> qmode 2 bits 0 = ARITH, 1 = FETCH, 2 = STORE, 3 = BRANCH
> qop 3 bits ALU function if qmode ≠ BRANCH, else
> branch condition
> ix 5 bits first operand register index
> sx 1 bit first operand bitwise not option
> mx 1 bit first operand increment by 1 option
> iy 5 bits second operand register index
> sy 1 bit second operand bitwise not option
> iz 5 bits write-back register
> c 9 bits immediate value
>
> 2. Register Fetch, Operand Computation, and Branch Condition
>
> 2a. Operand Computation.
>
> Here we compute
>
> x := (¬)^sx r_ix + mx
>
> y := | (¬)^sy ⌊2^c r_iy⌋ if iy ≠ 0 and qmode ≠ STORE
> | c if iy = 0 or qmode = STORE
I am confused about this:
> ⌊2^c r_iy⌋
Is that the floor operation?
This just uses the immediate as a bit shift, right?
That is pretty clever if it is what I think it is.
>
> If we expand the first on in terms of (sx, mx) we get the 4
> operations:
>
> x := | r_ix if sx = 0, mx = 0
> | ¬ r_ix if sx = 1, mx = 0
> | r_ix + 1 if sx = 0, mx = 1
> | - r_ix if sx = 1, mx = 1
>
> I assume splitting up negation into bitwise-not and increment does
> not add significantly to the complexity. If that is not the case,
> we should drop the mx from the instruction and simplify x to
>
> x := (-1)^sx r_ix
>
> Register r_0 is hardcoded to 0 for reads.
This might cause some issues. Based on the way that y is implemented,
one cannot have it be based on r_0 (since iy=0 signals immediate, I
think).
So, if you want to not input, then ix = 0 and iy = register number,
with c=0 and
sy = 1. If you want to add one, you use the ALU add operation.
Ok, I guess it doesn't cause a problem. never mind ;-)
>
> 2b. Branch Condition.
>
> do_branch := | false if qmode ≠ BRANCH
> | x = 0 if qmode = BRANCH and qop_0 = 0
> | x ≥ 0 if qmode = BRANCH and qop_0 = 1
>
> The x ≥ 0 test is cheap (check upper bit). If the x = 0 test
> is too
> expensive here, then we could add an equality comparison to the
> ALU
> which returns -1 for true and 0 for false.
>
That looks good.
> 3. ALU
>
> z := | x + y if qop = ADDSUB
> | x ⋅ y if qop = MULT
> | x ∧ y if qop = AND
> | x ∨ y if qop = OR
> | x ⊻ y if qop = XOR
> | ⌊2^y x⌋ if qop = SHIFT
> | ...
>
> Note that the choice of x and y in SHIFT gives the maximum
> flexibility
> by not duplicating the stage 2 shift operation.
>
I like that!
> 4. Memory and IO
>
> Optionally fetch from or store to local memory location z:
>
> z' := z if qmode = ARITH or BRANCH
> z' := M_z if qmode = FETCH
> M_z := r_iy if qmode = STORE
>
so, with my ALU idea if qmode = STORE, M_z := z2. I don't know if
that is
possible - if not, it should at least run through the stage 2 +1
logic (by switching
the qmode=STORE test from y to x in stage 2).
If this doesn't make much sense, ignore it.
We only have 512 instructions, so this is a matter of code
compression and speed. Since the qmode = STORE instructions are going
to pass down the pipeline like any other instruction, we might as
well use the
logic.
> 5. Write-Back
>
> r_iz := z' if iz ≠ 0 and qmode ≠ 11
>
> We could add a bitwise-not here, and drop OR from the ALU, but I'm
> running out of instruction bits.
Yeah, either way would work.
I like the design!
More information about the Open-graphics
mailing list