[Open-graphics] Synthesizing oga1hq

Michael Meeuwisse mickeymeeuw at gmail.com
Sun Aug 12 18:11:41 EDT 2007


What I mean is the following. I don't know if I can clip everywhere  
like I'm doing right now (I toss out quite some MSBs in the  
intermediate registers). Also, y_o seems to be dependent on an if- 
like statement (I'm not really familiar with verilog, and can't say I  
fully understand the earlier mentioned 35x35 multiplier) so don't  
know at all if that'll work.

- In stage 2 module definition add m_o
//> First stage of multiply
output[63:0] m_o;

// I don't get the Compute x / y part, but I do want to use x_o and  
y_o here in essence.
wire[15:0] m_o = x_o[15:0] * y_o[15:0];
wire[31:16] m_o = x_o[31:16] * y_o[31:16];
wire[48:32] m_o = x_o[15:0] * y_o[31:16];
wire[63:48] m_o = x_o[31:16] * y_o[15:0];

- In stage 3 module definition add m
//> First stage of multiply result
input[63:0] m;

`QOP_MULT: res_o <= < A bunch of additions here >;

- In the top level module, glue the two together.

Mike
www.wacco.mveas.com

On 12 Aug 2007, at 23:43, Michael Meeuwisse wrote:

> I assume the synthesis is automagically using the schematic on page  
> 6 of this document;
> http://www.xilinx.com/bvdocs/appnotes/xapp467.pdf
> Is there no way to do the first step (the multipliers) as extra  
> logic in stage 2? No wait, that was running at clock_2x, so maybe  
> stage 1? The final add of all intermediate results in stage 4?
>
> I got no idea how much the delay is through the dedicated hardware  
> multiplier. Try clipping x and y to 17 bits and see what the  
> synthesis results are then. Are they (besides unusable) fast enough  
> then?
>
> Mike
> www.wacco.mveas.com
>
> PS: SVN seems to be down, I'm looking at an old copy of hq.
>
> On 12 Aug 2007, at 21:13, Timothy Normand Miller wrote:
>
>> I've checked in some changes to hq.  There are a few bug fixes and
>> also a hack to add an input port and an output port as synthesis
>> placeholders.
>>
>> So, we have some synthesis results.  The winner is:  The multiplier.
>> To make a 32x32 multiplier, four of the 18x18's have to be bolted
>> together, and this is what we get:
>>
>> Slack:                  -12.191ns (requirement - (data path - clock
>> path skew + uncertainty))
>>   Source:               hq/stg2/y_lookup_r_16 (FF)
>>   Destination:          hq/stg3/res_r_25 (FF)
>>   Requirement:          10.000ns
>>   Data Path Delay:      22.191ns (Levels of Logic = 15)
>>   Clock Path Skew:      0.000ns
>>   Source Clock:         clock_2x_bufg rising at 10.000ns
>>   Destination Clock:    clock_bufg rising at 20.000ns
>>   Clock Uncertainty:    0.000ns
>>   Timing Improvement Wizard
>>   Data Path: hq/stg2/y_lookup_r_16 to hq/stg3/res_r_25
>>     Delay type         Delay(ns)  Logical Resource(s)
>>     ----------------------------  -------------------
>>     Tcko                  0.626   hq/stg2/y_lookup_r_16
>>     net (fanout=1)        0.475   hq/stg2/y_lookup_r<16>
>>     Tilo                  0.529   hq/stg2/v_o<16>_SW0
>>     net (fanout=2)        0.016   N4985
>>     Tilo                  0.529   hq/stg2/y_o<16>1
>>     net (fanout=4)        3.689   hq/s2_y<16>
>>     Tmult                 3.851   hq/stg3/multiplier/ 
>> Mmult_z_submult_2
>>     net (fanout=1)        4.221   hq/stg3/multiplier/ 
>> Mmult_z_submult_2_25
>>     Topcyg                0.904   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_lut<25>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_cy<25>
>>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_cy<25>
>>     Tbyp                  0.111   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_cy<26>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_cy<27>
>>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_cy<29>
>>     Tciny                 0.803   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_cy<30>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z1_Madd_xor<31>
>>     net (fanout=1)        1.150   hq/stg3/multiplier/Mmult_z1_Madd_31
>>     Topcyg                0.954   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_lut<48>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<48>
>>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<48>
>>     Tbyp                  0.104   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<49>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<50>
>>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<50>
>>     Tbyp                  0.104   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<51>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<52>
>>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<52>
>>     Tbyp                  0.104   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<53>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<54>
>>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<54>
>>     Tbyp                  0.104   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<55>
>>                                   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<56>
>>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_cy<56>
>>     Tcinx                 0.786   hq/stg3/multiplier/ 
>> Mmult_z2_Madd_xor<57>
>>     net (fanout=2)        1.379   hq/stg3/ 
>> Mshift_mul_shift0001_Sh<121>
>>     Tilo                  0.529   hq/stg3/res_r_mux0000<25>128
>>     net (fanout=1)        0.512   hq/stg3/res_r_mux0000<25>128/O
>>     Tfck                  0.600   hq/stg3/res_r_mux0000<25>2
>>                                   hq/stg3/res_r_25
>>     ----------------------------  ---------------------------
>>     Total                22.191ns (10.749ns logic, 11.442ns route)
>>                                   (48.4% logic, 51.6% route)
>>
>>
>> Too much multiply and add logic.  We want 10ns, but we're getting
>> 22ns.  We need to think about ways to either stretch the pipeline,  
>> run
>> the multiply as a parallel pipeline, or use fewer bits in the
>> multiplier and/or multiplicand.
>>
>> -- 
>> Timothy Normand Miller
>> http://www.cse.ohio-state.edu/~millerti
>> Open Graphics Project
>> _______________________________________________
>> Open-graphics mailing list
>> Open-graphics at duskglow.com
>> http://lists.duskglow.com/mailman/listinfo/open-graphics
>> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
>



More information about the Open-graphics mailing list