[Open-graphics] Synthesizing oga1hq
Petter Urkedal
urkedal at nbi.dk
Mon Aug 13 15:16:54 EDT 2007
On 2007-08-13, Mark wrote:
> Petter Urkedal wrote:
>> If we want to do a compromise, we could instead implement 32x16->32
>> multiply. That is, two multipliers in the ALU stage, and an adder in
>> the IO stage. If again we incorporate the shifts, we are down to 4
>> instructions for to compute a 32x32->32 product:
>> mul_32x32_from_32x16:
>> mul/h r0, r1, r3 ; r3 := r0 * r1[31:16]
>> mul/l r0, r1, r2 ; r2 := r0 * r1[15:0]
>> shift r3, 16, r3
>> add r2, r3, r2
>> Note that register forwarding does not work fully for the mul
>> instruction in this case, since it's split over two stages. There is a
>> 1 cycle delay before we can use the result, which means this is the only
>> way to order the instructions.
>> My guess is that the 16x16->32 multiplier with shifts on both the second
>> operand and the result is much cheaper than the extra adder and
>> multiplier of the 32x16->32 solution, and we save only one instruction
>> by by going to 32x16->32.
> How about if the shift was implicit in mul/h? That should be cheap in
> terms of hardware and it would decrease the cost of the soft 32x32 multiply
> to three cycles -- wouldn't it? (Sorry -- I have yet to read up on your
> architecture in detail.)
That's what I did in the 16x16->32 case, but in this case, the two-stage
mul/l instruction will not have a result ready at the point of the shift
instruction, so we can't save that cycle anyway.
> You can do 32x32 multiplies at nearly 200 MHz on the XC3S4000 with the
> caveat that they must be fully pipelined (4 stages -- just add the extra
> stages to the output of the inferred multiplier and XST will retime them
> back in). Can you afford to deepen the pipeline or stall on 32x32
> multiplications?
Deepening the pipeline has the cost of more complex register forwarding,
and the multiply will loose it's forwarding. It may still be feasible.
Stalling the pipeline means some re-design, but is there any gain
compared to issuing multiple instructions? Well, code size, but I don't
think that's a big issue.
More information about the Open-graphics
mailing list