[Open-graphics] Pipelining - 3D Engine
Timothy Normand Miller
theosib at gmail.com
Wed Mar 4 12:50:43 EST 2009
Some number of ages ago, I posted some things to the list where I
described how to handle all of this. We'll have to dig it out of the
archive. And some of this may have ended up on the wiki, but I'm not
In short, I described a method for merging the pipeline segment (a
segment being a set of stages) with the fifo such that (a) the
pipeline can't 'overfill' and lose anything, (b) 'packs' bubbles,
meaning that earlier stages can progress even when later stages are
busy as long as there are bubbles in the pipeline, and (c) prevents
the busy signal handling from becoming a critical path (with some
I described generic modules called "HEADER", "FOOTER", and two others
that can perform processing, one that is a simple pipeline stage, and
the other that can communicate with external logic (such as
reading/writing a FIFO).
Each pipeline stage has a 'busy' or 'valid' bit that indicates that
something is IN that pipeline stage. It means that the next stage
should accept output (if it can), or if it can't, it means that
earlier stages must wait. The last stage of a segment pays attention
only to the busy input from the next segment (unless it performs some
multi-step process), the second-to-last stage pays attention to the
busy input and the valid status of the last stage. As you move
backward, the number of inputs to for a stage to determine whether or
not to do something grows, and eventually reaches a point where it can
become a critical path. At that point, you break the pipeline into
The FOOTER's job is basically just to interpret the signals from a
subsequent segment. The HEADER's job is to break the combinatorial
path between all of the busy signals in one segment and those in
another by making the segment's busy signal registered. To do this,
it acts as a queue that can hold 0 or 1 entries, multiplexing between
either the input from the previous segment (if this segment is not
busy) or a registered copy of an earlier input (if the segment is busy
and therefore holding something). Due to the multiplexing, you
typically want to put a DUMMY stage after a HEADER. All it does is
remove the multiplexing from the critical paths.
The nice thing about HEADER and FOOTER stages is that they "speak fifo
language", which means that it is trivial to bolt a fifo onto the end
of a pipeline segment, because a pipeline segment looks just like a
fifo. It has exactly the same sorts of external control signals.
I provided source code to all of this. We'll have to dig it out. So
far, I haven't been able to get Google to find it.
A stage that performs a multi-step process (like what we'll have to do
with textures in certain modes) will basically function as a pipeline
segment all its own, managing a registered busy signal for prior
stages, and that busy signal will be a combination of stage machine
states and busy signals from subsequent stages.
On 3/4/09, Kenneth Ostby <kenneo at langly.org> wrote:
> In the series on architectural / performance rants.
> The last week I committed some basic verilog skeleton files which marks
> the beginning of the rasterization module of the 3D engine. However, it
> was one problem which struck me with the implementation of the logic
> itself, namely how to solve pipeline stalls. A good example would be in
> the horizontal rasterizer, where you can see that I have divided it
> up into 3 basic stages, corresponding to the calculations found in the
> new_model code. Basically:
> 1) adjustment = some math
> 2) initial point = more math * adjustment
> 3) for each step in width:
> calculate values for step.
> Now, the two first stages are easy, the cycle count from the entry of
> data into stage 1 until it's ready for output in stage 2 is fixed, and
> mainly depends on the latency introduced by the floating point
> operations involved. However, the problem comes with stage 3. Since
> stage 3 logically involves a loop over the width, it might have to stall
> the pipeline while waiting for the while loop to finish processing. Thus
> the question arise, what are we going to do with the data already
> introduced into the pipeline?
> The naive solution to the problem/challenge is to introduce a queue at
> the end of each stage. This means that the depth of the queue has to be
> at least the same number as the cycles used by the stage itself. This is
> explained by the case where we have filled the entire pipeline of the
> floating point module and we encounter a pipeline stall. In that case we
> have to be able to store all of the output generated by the floating
> points modules, since the FP modules have no mechanisms for stalling
> themselves. Then the stage has to stall, not being able to accept any
> data, nor processing anything until the stall ends.
> My major problem with this solution is that it, as far as I can see, in
> the case of a stall will introduce a latency in terms of startup costs.
> If the queues are full, and if we encounter a stall from the next step
> in the pipeline we cannot accept new incoming data either, since we
> cannot guarantee that we will have space in the outgoing queue for the
> results. Hence we will have a startup cost in terms of cycles equal to
> the cycle count data uses through the pipeline step.
> Anybody have some good solutions? The easy answer is to say that it
> doesn't cost that much compared to feature Y, but it feels a bit like
> cheating. Other solution is to, in every pipeline step, incorporate a
> "store/do not continue" flag and store the output in a register local to
> the step.
> If you read through this entire mail,
> Thank you for your attention :)
>  http://langly.org/og/rastHori.png
> Life on the earth might be expensive, but it
> includes an annual free trip around the sun.
> Kenneth Østby
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> -----END PGP SIGNATURE-----
> Open-graphics mailing list
> Open-graphics at duskglow.com
> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
Timothy Normand Miller
Open Graphics Project
More information about the Open-graphics