[Open-graphics] Synthesizing oga1hq

Michael Meeuwisse mickeymeeuw at gmail.com
Sun Aug 12 17:43:05 EDT 2007


I assume the synthesis is automagically using the schematic on page 6  
of this document;
http://www.xilinx.com/bvdocs/appnotes/xapp467.pdf
Is there no way to do the first step (the multipliers) as extra logic  
in stage 2? No wait, that was running at clock_2x, so maybe stage 1?  
The final add of all intermediate results in stage 4?

I got no idea how much the delay is through the dedicated hardware  
multiplier. Try clipping x and y to 17 bits and see what the  
synthesis results are then. Are they (besides unusable) fast enough  
then?

Mike
www.wacco.mveas.com

PS: SVN seems to be down, I'm looking at an old copy of hq.

On 12 Aug 2007, at 21:13, Timothy Normand Miller wrote:

> I've checked in some changes to hq.  There are a few bug fixes and
> also a hack to add an input port and an output port as synthesis
> placeholders.
>
> So, we have some synthesis results.  The winner is:  The multiplier.
> To make a 32x32 multiplier, four of the 18x18's have to be bolted
> together, and this is what we get:
>
> Slack:                  -12.191ns (requirement - (data path - clock
> path skew + uncertainty))
>   Source:               hq/stg2/y_lookup_r_16 (FF)
>   Destination:          hq/stg3/res_r_25 (FF)
>   Requirement:          10.000ns
>   Data Path Delay:      22.191ns (Levels of Logic = 15)
>   Clock Path Skew:      0.000ns
>   Source Clock:         clock_2x_bufg rising at 10.000ns
>   Destination Clock:    clock_bufg rising at 20.000ns
>   Clock Uncertainty:    0.000ns
>   Timing Improvement Wizard
>   Data Path: hq/stg2/y_lookup_r_16 to hq/stg3/res_r_25
>     Delay type         Delay(ns)  Logical Resource(s)
>     ----------------------------  -------------------
>     Tcko                  0.626   hq/stg2/y_lookup_r_16
>     net (fanout=1)        0.475   hq/stg2/y_lookup_r<16>
>     Tilo                  0.529   hq/stg2/v_o<16>_SW0
>     net (fanout=2)        0.016   N4985
>     Tilo                  0.529   hq/stg2/y_o<16>1
>     net (fanout=4)        3.689   hq/s2_y<16>
>     Tmult                 3.851   hq/stg3/multiplier/Mmult_z_submult_2
>     net (fanout=1)        4.221   hq/stg3/multiplier/ 
> Mmult_z_submult_2_25
>     Topcyg                0.904   hq/stg3/multiplier/ 
> Mmult_z1_Madd_lut<25>
>                                   hq/stg3/multiplier/ 
> Mmult_z1_Madd_cy<25>
>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
> Mmult_z1_Madd_cy<25>
>     Tbyp                  0.111   hq/stg3/multiplier/ 
> Mmult_z1_Madd_cy<26>
>                                   hq/stg3/multiplier/ 
> Mmult_z1_Madd_cy<27>
>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
> Mmult_z1_Madd_cy<29>
>     Tciny                 0.803   hq/stg3/multiplier/ 
> Mmult_z1_Madd_cy<30>
>                                   hq/stg3/multiplier/ 
> Mmult_z1_Madd_xor<31>
>     net (fanout=1)        1.150   hq/stg3/multiplier/Mmult_z1_Madd_31
>     Topcyg                0.954   hq/stg3/multiplier/ 
> Mmult_z2_Madd_lut<48>
>                                   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<48>
>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<48>
>     Tbyp                  0.104   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<49>
>                                   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<50>
>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<50>
>     Tbyp                  0.104   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<51>
>                                   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<52>
>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<52>
>     Tbyp                  0.104   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<53>
>                                   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<54>
>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<54>
>     Tbyp                  0.104   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<55>
>                                   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<56>
>     net (fanout=1)        0.000   hq/stg3/multiplier/ 
> Mmult_z2_Madd_cy<56>
>     Tcinx                 0.786   hq/stg3/multiplier/ 
> Mmult_z2_Madd_xor<57>
>     net (fanout=2)        1.379   hq/stg3/Mshift_mul_shift0001_Sh<121>
>     Tilo                  0.529   hq/stg3/res_r_mux0000<25>128
>     net (fanout=1)        0.512   hq/stg3/res_r_mux0000<25>128/O
>     Tfck                  0.600   hq/stg3/res_r_mux0000<25>2
>                                   hq/stg3/res_r_25
>     ----------------------------  ---------------------------
>     Total                22.191ns (10.749ns logic, 11.442ns route)
>                                   (48.4% logic, 51.6% route)
>
>
> Too much multiply and add logic.  We want 10ns, but we're getting
> 22ns.  We need to think about ways to either stretch the pipeline, run
> the multiply as a parallel pipeline, or use fewer bits in the
> multiplier and/or multiplicand.
>
> -- 
> Timothy Normand Miller
> http://www.cse.ohio-state.edu/~millerti
> Open Graphics Project
> _______________________________________________
> Open-graphics mailing list
> Open-graphics at duskglow.com
> http://lists.duskglow.com/mailman/listinfo/open-graphics
> List service provided by Duskglow Consulting, LLC (www.duskglow.com)



More information about the Open-graphics mailing list