[Open-graphics] 3D Engine - Some thoughts
Timothy Normand Miller
theosib at gmail.com
Wed Mar 4 13:08:34 EST 2009
On 2/22/09, Kenneth Ostby <kenneo at langly.org> wrote:
> Recently, I've been looking up and reading through the documentation
> available for the 3D Engine part of the project ( It's in need of
> some love as well ), and the software model of our rasterizer found in
> the SVN repository. This has been in order to try to plan ahead, and
> identify the work that has to be done. And in relations to that I have
> some questions.
> First, in each of the different stages  in the pipeline, Rasterize,
> Ownership, etc, we have some that just have to be forwarded for further
> use into the pipeline. Texture coordinates and Texture ID is a good
> example. Hence, what I'm thinking, is that if we in front of the
> Rasterizer / Scissor step in the pipeline, include a issue unit, it
> should be possible for the issue unit to bite off the unneeded parts and
> just forward it to the correct stage where it's first needed. Then, in
> the different stages in the pipeline, we could include a FIFO buffer
> where we store the data for future usage. Also, since we're operating on
> a strict in-order processor, it shouldn't need to be anything more
> complex than a simple FIFO buffer.
You seem to be suggesting that we provide an ability to bypass
pipeline segments that are not going to be used. This would certainly
reduce latency. There are some tradeoffs, however. One is that what
is enabled and disabled changes fairly frequently. To enable and
disable pipeline stages would require a pipeline flush, introducing a
significant delay. As long as we're keeping the pipeline busy, it
doesn't really matter what the latency is. Secondly, this introduces
the need to add additional mutiplexing and routing around pipeline
segments. All reasonable configurations need to be accounted for,
which implies crossbars. Those introduce latency of their own, as
well as creating routing congestion in the FPGA. We'll get a higher
clock frequency if we can make the pipeline streamlined and easy to
place and route.
> Also, by doing this, we can try to hide some of the latencies found in
> the pipeline in memory reads / writes. As an example, imagine that we
> have the case of texture element. In the current model, it would require
> that the pipeline stalls while waiting for a texture fetch. If we send
> the coordinates ahead, it should be possible to prefetch some of the
> needed texels before the fragments gets to the texture stage. I am sure
> that this technique can be utilized more places in the pipeline as well.
Let's consider something simpler, like the Z buffer. This is straight
forward. For each fragment, we need to read a word from memory and
possible write one. The write is trivial, since that can be just
dumped into a fifo and processed out of order (unless the fifo fills,
in which case, we're memory bound, and we don't care about the delay).
The read, however, complicates things. The solution involves three
fifos. The memory system is already build around fifos. For reads,
requests are issued down one pipe, and then the data comes back in
another. If you could continue to issue requests asynchronously from
processing the data, you could keep the pipeline moving. What we do
is insert a third fifo between the pipeline segment that requests the
reads and the segment that receives and processes the data. As
requests are pushed into the memory fifo, fragments are pushed into
another one that only fills up and causes a stall if memory can't keep
up with requests. Let's call this third fifo a "latency absorber".
For Z, this is quite straightforward. For textures, which may involve
multiple requests per fragment, which implies a state machine that
will hold up the pipeline. The receiver also must loop over multiple
received pixels to fully compute the fragment. I have two possible
solutions to this. One is to have two full state machines at each end
of the latency absorber. Each has its own set of configuration
variables, and we just design them so that they do complementary work.
The other alternative is to have one state machine at the head of the
queue that also passes commands down the latency absorber which are
processed by the receiver. Those commands would be things like
"here's a fragment to be processed", "expect a pixel from memory, do
this with it to modify the fragment", and "complete processing of the
fragment and forward it to the next segment". (Some of those commands
may be issued simultaneously, like when the last texel is received,
the finish command may just be a flag bit.)
Speaking of configuration variables, we would pass those down the
pipeline, as if they were fragment data, reusing some of the same
signals. When the parameter reaches the segment or stage that holds
that variable, it gets stored right there. Thus, the pipeline is not
stalled by variable changes.
> Secondly, we have a lot of configuration parameters which need to set
> for the engine, and which needs to be handled in an efficient manner.
> Hence, I'm suggesting that we add some sort of registry file to the
> architecture as well. Also by employing this technique, we could later
> incorporate some sort of performance counter system, or as a way of
> giving feedback from the system.
Counters would definitely be useful for debugging. I've already
described how to handle register writes. They pass down the pipeline
and are stored locally in the stage. For read-backs, which are rare
and only used for debugging, we can make them happen out of order.
Just a big MUX in the engine routes them all back to PCI.
> I tried to modify Tim's original block diagram with Gimp to kinda show
> what I was talking about and the result can be found here . Also, I
> would like to apologize for breaking the pretty diagram. It seems
> like a simple action such as drawing a box or a straight line in gimp is
> meant to be hard.
>  http://wiki.opengraphics.org/tiki-index.php?page=OGA%20Engine
>  http://langly.org/og/block_diagram.gif
>  http://langly.org/og/block-mod.gif
> Life on the earth might be expensive, but it
> includes an annual free trip around the sun.
> Kenneth Østby
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> -----END PGP SIGNATURE-----
> Open-graphics mailing list
> Open-graphics at duskglow.com
> List service provided by Duskglow Consulting, LLC (www.duskglow.com)
Timothy Normand Miller
Open Graphics Project
More information about the Open-graphics