[Open-graphics] Multipliers in oga1hq

Mark mark at jarvin.net
Wed Aug 15 20:42:17 EDT 2007


Nicholas S-A wrote:
>> Ultimately, we may decide not to even have a multiply instruction and
>> just code it when necessary.  This would be horribly slow, but if it's
>> a rare event, it won't matter so much.  All I can think of for this is
>> where we want to multiply a 16-bit unsigned line stride by a 16-bit
>> signed Y coordinate.  In other cases, we multiply by a constant,
>> eliminating any branches (or decisions anyhow) entirely.
> 
> Not really horribly slow, actually. If you use the Russian peasants 
> algorithm [1],
> you can implement it in around 200 cycles. In pseudocode:
> z=x*y, t1 temporary.
> run 32 times:
>     mov x to t1
>     AND t1 with 0x0001 -- these two just get the last bit
>     skip if zero:
>         add y to z -- this deals with remainder
>     shift x right  1 -- half it
>     shift z left 1 -- double it
> 
> which should be 32 * 6 = 192 cycles.

There's loop control overhead, too:

     mult:  move 32 to t2
            move  0 to  z
     loop:  move  x to t1
            and t1 with 1
            beq skip
            shift x right 1         # delay slot
            add   y to z
     skip:  subtract 1 from t2
            bne loop
            shift z left 1          # delay slot

Which is 32 * (7 or 8) + 2 = 226-258 cycles, I guess.  Still around 200, 
as you said, but it's also still an order of magnitude slower than the 
same algorithm implemented in hardware (and I believe it would be 
comparatively cheap in hardware).  As for whether or not that's 
horrible, well, it's a matter of opinion. :)


More information about the Open-graphics mailing list