[Open-graphics] Synthesizing oga1hq
Michael Meeuwisse
mickeymeeuw at gmail.com
Sun Aug 12 18:11:41 EDT 2007
What I mean is the following. I don't know if I can clip everywhere
like I'm doing right now (I toss out quite some MSBs in the
intermediate registers). Also, y_o seems to be dependent on an if-
like statement (I'm not really familiar with verilog, and can't say I
fully understand the earlier mentioned 35x35 multiplier) so don't
know at all if that'll work.
- In stage 2 module definition add m_o
//> First stage of multiply
output[63:0] m_o;
// I don't get the Compute x / y part, but I do want to use x_o and
y_o here in essence.
wire[15:0] m_o = x_o[15:0] * y_o[15:0];
wire[31:16] m_o = x_o[31:16] * y_o[31:16];
wire[48:32] m_o = x_o[15:0] * y_o[31:16];
wire[63:48] m_o = x_o[31:16] * y_o[15:0];
- In stage 3 module definition add m
//> First stage of multiply result
input[63:0] m;
`QOP_MULT: res_o <= < A bunch of additions here >;
- In the top level module, glue the two together.
Mike
www.wacco.mveas.com
On 12 Aug 2007, at 23:43, Michael Meeuwisse wrote:
> I assume the synthesis is automagically using the schematic on page
> 6 of this document;
> http://www.xilinx.com/bvdocs/appnotes/xapp467.pdf
> Is there no way to do the first step (the multipliers) as extra
> logic in stage 2? No wait, that was running at clock_2x, so maybe
> stage 1? The final add of all intermediate results in stage 4?
>
> I got no idea how much the delay is through the dedicated hardware
> multiplier. Try clipping x and y to 17 bits and see what the
> synthesis results are then. Are they (besides unusable) fast enough
> then?
>
> Mike
> www.wacco.mveas.com
>
> PS: SVN seems to be down, I'm looking at an old copy of hq.
>
> On 12 Aug 2007, at 21:13, Timothy Normand Miller wrote:
>
>> I've checked in some changes to hq. There are a few bug fixes and
>> also a hack to add an input port and an output port as synthesis
>> placeholders.
>>
>> So, we have some synthesis results. The winner is: The multiplier.
>> To make a 32x32 multiplier, four of the 18x18's have to be bolted
>> together, and this is what we get:
>>
>> Slack: -12.191ns (requirement - (data path - clock
>> path skew + uncertainty))
>> Source: hq/stg2/y_lookup_r_16 (FF)
>> Destination: hq/stg3/res_r_25 (FF)
>> Requirement: 10.000ns
>> Data Path Delay: 22.191ns (Levels of Logic = 15)
>> Clock Path Skew: 0.000ns
>> Source Clock: clock_2x_bufg rising at 10.000ns
>> Destination Clock: clock_bufg rising at 20.000ns
>> Clock Uncertainty: 0.000ns
>> Timing Improvement Wizard
>> Data Path: hq/stg2/y_lookup_r_16 to hq/stg3/res_r_25
>> Delay type Delay(ns) Logical Resource(s)
>> ---------------------------- -------------------
>> Tcko 0.626 hq/stg2/y_lookup_r_16
>> net (fanout=1) 0.475 hq/stg2/y_lookup_r<16>
>> Tilo 0.529 hq/stg2/v_o<16>_SW0
>> net (fanout=2) 0.016 N4985
>> Tilo 0.529 hq/stg2/y_o<16>1
>> net (fanout=4) 3.689 hq/s2_y<16>
>> Tmult 3.851 hq/stg3/multiplier/
>> Mmult_z_submult_2
>> net (fanout=1) 4.221 hq/stg3/multiplier/
>> Mmult_z_submult_2_25
>> Topcyg 0.904 hq/stg3/multiplier/
>> Mmult_z1_Madd_lut<25>
>> hq/stg3/multiplier/
>> Mmult_z1_Madd_cy<25>
>> net (fanout=1) 0.000 hq/stg3/multiplier/
>> Mmult_z1_Madd_cy<25>
>> Tbyp 0.111 hq/stg3/multiplier/
>> Mmult_z1_Madd_cy<26>
>> hq/stg3/multiplier/
>> Mmult_z1_Madd_cy<27>
>> net (fanout=1) 0.000 hq/stg3/multiplier/
>> Mmult_z1_Madd_cy<29>
>> Tciny 0.803 hq/stg3/multiplier/
>> Mmult_z1_Madd_cy<30>
>> hq/stg3/multiplier/
>> Mmult_z1_Madd_xor<31>
>> net (fanout=1) 1.150 hq/stg3/multiplier/Mmult_z1_Madd_31
>> Topcyg 0.954 hq/stg3/multiplier/
>> Mmult_z2_Madd_lut<48>
>> hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<48>
>> net (fanout=1) 0.000 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<48>
>> Tbyp 0.104 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<49>
>> hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<50>
>> net (fanout=1) 0.000 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<50>
>> Tbyp 0.104 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<51>
>> hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<52>
>> net (fanout=1) 0.000 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<52>
>> Tbyp 0.104 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<53>
>> hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<54>
>> net (fanout=1) 0.000 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<54>
>> Tbyp 0.104 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<55>
>> hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<56>
>> net (fanout=1) 0.000 hq/stg3/multiplier/
>> Mmult_z2_Madd_cy<56>
>> Tcinx 0.786 hq/stg3/multiplier/
>> Mmult_z2_Madd_xor<57>
>> net (fanout=2) 1.379 hq/stg3/
>> Mshift_mul_shift0001_Sh<121>
>> Tilo 0.529 hq/stg3/res_r_mux0000<25>128
>> net (fanout=1) 0.512 hq/stg3/res_r_mux0000<25>128/O
>> Tfck 0.600 hq/stg3/res_r_mux0000<25>2
>> hq/stg3/res_r_25
>> ---------------------------- ---------------------------
>> Total 22.191ns (10.749ns logic, 11.442ns route)
>> (48.4% logic, 51.6% route)
>>
>>
>> Too much multiply and add logic. We want 10ns, but we're getting
>> 22ns. We need to think about ways to either stretch the pipeline,
>> run
>> the multiply as a parallel pipeline, or use fewer bits in the
>> multiplier and/or multiplicand.
>>
>> --
>> Timothy Normand Miller
>> http://www.cse.ohio-state.edu/~millerti
>> Open Graphics Project
>> _______________________________________________
>> Open-graphics mailing list
>> Open-graphics at duskglow.com
>> http://lists.duskglow.com/mailman/listinfo/open-graphics
>> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
>
More information about the Open-graphics
mailing list