[Open-graphics] Multipliers in oga1hq
Farhan Mohamed Ali
farhan at cmu.edu
Sat Sep 1 17:22:56 EDT 2007
On Sat, September 1, 2007 12:34 pm, Petter Urkedal said:
> On 2007-08-31, Mark wrote:
>> I've posted a run-down of the multipliers so far (any important ones
>> missing?) at http://jarvin.net/opengraphics/. This includes photos of
>> the most critical path post-PAR.
>
> Nice overview of the syntheses; I've had some trouble getting the
> synthesis tools working, so I appreciate you effort. I suspect my
> version did not synthesise as I intended. In the attached version I made
> the LUT4s explicit by putting them in a separate module. Not sure if it
> would have made a difference. Well, I think we can go with the radix-4
> version unless there is compelling reason to optimise further *and* it is
> technically feasible to use a 4x clock for the multiplier (which I don't
> know).
>
> So, let's consider integrating Farhan's version in the nanocontroller.
> Given that the VGA code will use 16 bit, would it be better to reduce the
> multiplier to 16x16->32? Will this be insufficient for the DMA code?
> (Does DMA require multiply at all, other than powers of 2?) Conversely is
> 33 cycles multiply to slow for the VGA code, and would 17 cycles be fast
> enough?
>
16x16 will take 9 cycles. Perhaps it could be fast enough to be clocked
at 200mhz if it is a dedicated 16x16 part. I'll try this out. I will also
try adding a special mode to the current 32x32 version that assumes 16
bit inputs and takes 9 cycles to complete for that mode.
> I'd go with the non-blocking out-of-band approach. That is, the
> programmer will count instructions before fetching the result. One
> instruction takes a reg and a reg/imm and issues the multiply, and
> another writes back result to a reg. The ALU stage can be the point of
> transit. The issue-multiply instruction transfers the ALU operands to
> the multiplier and initiates the multiply. The multiplier holds the
> result after finishing as long as no new multiply is issued. A
> fetch-product instruction moves the result to the ALU output, thus
> allowing it to be part in register-forwarding.
>
> As a slight variant, we can hard-code the multiplication result to r31
> and drop the fetch-product instruction. That's just as easy to
> implement, and it saves one cycle, since it means the product can be
> directly used as an operand to the ALU.
>
> The introduction of interrupts, if needed, will not cause problems as
> long as interrupt handlers don't use the multiplier. Moreover, if an
> interrupt handler needs to use the multiplier, this is also possible:
> When the interrupt handler is sure any pending multiplication is
> finished, it can save the result R. Then it can do it's own
> multiplication. Before returning to normal code, it must perform a
> multiply R*1 and wait long enough for the result to be available.
> _______________________________________________ Open-graphics mailing
> list Open-graphics at duskglow.com
> http://lists.duskglow.com/mailman/listinfo/open-graphics List service
> provided by Duskglow Consulting, LLC (www.duskglow.com)
More information about the Open-graphics
mailing list