[Open-graphics] Semi-official rendering pipeline description version 0.1

Timothy Miller theosib at gmail.com
Tue Dec 21 22:26:15 EST 2004


Thank you for your response!  


On Wed, 22 Dec 2004 02:50:37 +0100, Nicolas Capens <nicolas at capens.net> wrote:
> 
[snip]
> > 1.0: Rasterize
> > Iterates over screen coordinates, texture coordinates (pre-divide),
> > shade colors (pre-divide), and W.  Will iterate more than once per
> > fragment in case of filtering (bilinear, trilinear).
> > Produces are two sets of texture coordinates (which can be extensive
> > in the case of filtering) and two sets of shade numbers (primary and
> > secondary).
> 
> I think you're mixing some terms here. Filtering is a technique to
> reconstruct sampled data, in this case a texture. That is, it computes
> the values 'between' texels. Bilinear and trilinear are filtering
> techniques. Blending is the term that is usually used to denote the
> operations (multiplication, addition, etc.) between texture stages in
> the fixed-function pixel pipeline. Multi-texturing is when the hardware
> is capable of blending more than one texture per pass. Multi-pass
> rendering allows to perform extra blendin operations on a triangle by
> rendering it again and blending with the framebuffer color. The logical
> model is that all hardware has eight stages (as if it had
> multi-texturing capabilities for eight textures), but in practice the
> driver is often responsible to mix multi-texturing and mutli-pass
> rendering to obtain the desired result.

I really did mean "filtering" here.  For trilinear filtering, you need
to fetch 8 texels and then interpolate in order to produce one texture
value.  The iteration has to occur somewhere.  For the moment, I'm
sticking it into the thing that already iterates, which is the
rasterizer.  I may change my mind later.

> So, the rasterizer has to iterate more than once per fragment in case of
> multi-texturing.

Since for filtering, the numbers being generated are related; that
calls for iteration.  By contrast, for multitexturing, since the
texture coordinates for different textures are independent, there's
little reason not to compute them in parallel, even if they're used
sequentially.

Of course, I may have to compute them serially, depending on how
complex a floating-point adder turns out to be.  I'm already going to
have to play some nasty hardware tricks to deal with the fact that the
adder will require multiple pipeline stages, but in order to produce
one set of numbers per clock, I have to hide that fact.

For instance, I can step by four units every four clock cycles, and
then I have to have additional adders do compute the intermediate
values.  But this is the hidden pain that chip designers always have
to go through.  :)

> > 2.0: Texture
> > Fetch texels from framebuffer and merge via filtering
> 
> Typo: fetch from texture.

Textures are stored in the framebuffer.  

OpenGL has separate concepts for "textures", "accumulation buffer",
and "display".  The hardware does not.  The driver has to program the
hardware to point to the right places in the framebuffer for each
thing.

> 
> > 3.0: Color sum and Fog
> > Combine texels with shades and fog.  Note that I don't understand how
> > fog works.  It's some Z-dependent blend, but I don't know the math.
> > I'm also kinda weak on how texture/shade combinations work in
> > color-sum. Also, I think there's a specular offset which needs to be
> > added here, unless that's what the secondary shade is for.
> 
> A few different architectures are possible here. The most common for a
> modern fixed-function pipeline is that blending with the diffuse color
> is usually done in a texture blend stage. The specular color is added
> after the blending stages.

Ok, this is how Balaji described it.  Two gouraud shadings are done in
parallel.  One, called the "Primary color" is passed through the
texture pipeline and blended with them.  The other, called the
"secondary color" is blended after the textures are dealt with.

> Fog is done like this: "endColor = (1 - fogDensity) * pixelColor +
> fogDensity * fogColor". So when fogDensity is 0 we get the pixel color
> from the previous stage, if it's 1 we only see the fog color. The fog
> density can be computed linearly, exponentially, etc, but it's usually
> done in the vertex lighting pipeline. The rasterizer just has to
> interpolate it.
> 
> http://www.ati.com/developer/sdk/RADEONSDK/Html/Info/RADEONPixelPipeline.html

Hmmm... well, we're going to have to discuss some more detailed
formulas.  I need to know whether or not to divide by W, etc.  I'll
look at that web page tomorrow.

> > 4.0: Ownership test
> > Fetch ownership values from a framebuffer store and compare to owner
> > of fragment.  Pass only those that match.
> > This stage may be dropped.  It's better to render off-screen and then
> > bitblt to the display, which is to say that double-buffering is always
> > implicitly ON.  Doing it that way doesn't require ownership test.
> 
> No idea really... I've never used it. I guess it's mostly a job for the
> driver.

It's used for the case when you are rendering directly to the screen
in a windowing system.  The simplest way for the windowing system to
indicate arbitrary partial occlusion of the OpenGL window is to
produce a "map" of pixels that indicates which pixels OpenGL is
allowed to touch.

However, Balaji, in his 3D driver, never used the ownership test. 
Instead, he enabled double-buffering implicitly.  When the OpenGL app
is done rendering a scene, it "flips", which results in a bitblt from
the off-screen rendering area to the window, where clipping can be
done in a more efficient manner.  This has the added advantage of not
requiring GLX to request repaints from the application when the window
becomes unobscured.

> > 5.0: Scissor and Alpha
> > Coordinates are compared against scissor box and alpha value is
> > compared against constant.  Fragments which fail either test are dropped.
> 
> Even better for scissoring is to just use frustum clipping. You can just
> render to any size of rectangle on the screen, directly. Again work for
> the driver software.

I don't know what frustrum clipping is.  If you're thinking about
combining a window offset with a clipping rectangle, then that's all
implicitly built-in without requiring anything special.  If you're
talking about 3D clipping, well, that need to be done in the host. 
Furthermore, if there is any slight round-off error in the vertex
processing which might cause pixels to bleed slightly out of the
window, you still want hard hardware 2D clipping to step in and
confine it solidly.

> > 6.0: Stencil and Depth read
> > Read framebuffer store of stencil and depth values.  Stencil is an
> > 8-bit number.  I don't know enough about it to tell if it needs to be
> > coverted to [0..1] first.  Depth is stored as a 24-bit float.  Depth
> > values aren't really Z but rather Z/d.
> 
> Stencil is just an 8-bit unsigned number.

Cool.

> > 6.1: Stencil and depth test
> > Stencil/Depth buffer are read.  Fragments which fail either test are
> > dropped.
> 
> Stencil tests that fail still perform an update operation. Depth tests
> that fail can also still update the stencil buffer. The secret formula
> is (PASS * Z + ZFAIL * !Z) * S + FAIL * !S. :)

Could you elaborate on that a bit more, please?

> Doing an early depth test is also highly recommended. Beware that it's
> only possible when alpha testing is disabled though.

How early can you make it? 

The only things between here and the rasterizer are alpha test and
scissor which don't generate any memory traffic, so doing the Z test
earlier don't make any difference.  It's all being done in a hardware
pipeline.

> > 6.2: Depth write
> > Write updates to depth buffer.
> >
> > 7.0: Antialias accumulate
> > When supersampling is enabled, rasterization is done at a higher
> > resolution than the display.  This unit modifies coordinates, colors,
> > and alpha values so that they can be averaged together appropriately.
> > Fragments which correspond to the same pixel are summed until the
> > fragment advances to the next pixel.  Then the combined fragment is
> > forwarded with the cumulative value.  As such, this only allows
> > columns to be automatically combined.  Rows are combined via merge.  A
> > row-buffer would be better, but it would require excessive logic.
> 
> I think it's better to leave this to software. Render to an offscreen
> buffer four times from a slightly shifted viewpoint and average together.

I hadn't thought of doing it that way.  That actually sounds like a
nice way of doing it, and it doesn't require me to add any hardware!

There is one thing about antialiasing that I don't fully understand
the extent of yet.  One way to render triangles involves rendering
only those whose centers are within the triangle.  The other way
involves rendering every pixel which has ANY amount of coverage by the
triangle and computing coverage values which become alpha values.

I can see why this would give nice, smooth triangle edges and would be
necessary for antialiased lines without supersampling, but I'm not
sure how to do it in a simple way.

> > 8.0: Destination read
> > Fetch target pixels from framebuffer
> >
> > 8.1: Merge
> > There are three sets of operations that can be performed, each of
> > which is mutually exclusive:
> > 8.1.1: Alpha blend
> > 8.1.2: Logic raster op
> > 8.1.3: Arithmetic raster op (same set of operations which can be done
> > in color sum, I think)
> > Additionally, a planemask can be applied to prevent certain dest bits
> > from being over-written.
> > In this stage, we need everything necessary to be able to implement an
> > accumulation buffer properly, which is to say that we render to the
> > accumulation buffer like an off-screen buffer, then turn around and
> > use the accumulation buffer as a texture.
> >
> > 8.2: Dest write
> > Framebuffer formats supported, all of which are ARGB:
> > - 8:8:8:8
> > - 5:9:9:9
> > - 2:10:10:10
> > - 4:8:12:8
> 
> Except for 8:8:8:8, these pixel formats are not commonly supported. You
> can acutally do everything with 8:8:8:8 just fine.

The reason to provide the others is so that driver writers can get
greater amounts of precision for RGB for intermediate results.  If no
one thinks there will be any use for them, I don't have to put them
in, but I think the extra logic will actually be very small.

> > In each case with 10 bits or less, a LUT lookup can occur for each
> > channel, selectively, before conversion to float.  This allows for
> > indexed textures.
> > (Note:  I know there are other pixel formats which OpenGL requires.
> > Some will be unsupported, and some will be supported via transparent
> > conversion in the host interface.)
> 
> All of them can be converted.
> 
> > What the heck are "multi-sampling texture ops"?
> 
> I believe you refer to anisotropic filtering? It's a more advanced form
> of filtering than bilinear and trilinear.

I didn't get that impression.  In fact, I didn't see "anisotropic
filtering" anywhere at all in the OpenGL spec.

> > There are some aspects of the way antialiasing is described which are
> > hard to decipher.  I'll get more into that later, unless it becomes
> > apparent from responses to this post.
> 
> Multisample anti-aliasing is extremely advanced and costs a lot of
> hardware. I wouldn't even consider it... I'll explain in detail if you want.

Is there a difference between multisampling and supersampling?

The way I was considering doing it requires little more than some
shift logic and an extra register to hold sums.  You render at a
higher resolution and then shrink down just before Merge.  Are we
talking about different things?

Perhaps we should develop a glossary.  I use one set of terms, you use
another, I read documents that use another, and the OpenGL spec uses
yet another.  As a result, we're all very confused.

Thank you again!



More information about the Open-graphics mailing list