[Open-graphics] Sample VGA translation code, for nanocontroller
Timothy Normand Miller
theosib at gmail.com
Wed Sep 5 14:43:31 EDT 2007
On 9/5/07, Mark <mark at jarvin.net> wrote:
> Could you be more specific about how VGA graphics mode memory accesses
> aren't standard reads and writes? I'm reading up on VGA, but I haven't
> got there yet.
Patrick did the research on that, so he'll have to fill in the
details. But basically there are some raster operators that can be
applied, and there's a bitblt buffer of some sort, where we put it in
a mode so that reads and writes cause macro operations.
> Up to now, my impression was that, e.g., 320x200x256 VGA was just a
> linear array of 64000 bytes that was accessed using standard memory
> reads & writes. Then, (traditionally), the VGA hardware driving the
> display would read from that linear array and do some translation on the
> value read (basically a table lookup to the palette). Are the
> 320x200x16 and 640x480x16 modes also laid out as linear arrays of bytes?
> Or are they linear arrays of nibbles? Or something else totally?
320x200x256 may be linearly laid out. But 640x480x16 is laid out as
four separate 1-bit framebuffers, one for each bitplane. There are
actually a variety of different modes you can put it in to mask
different planes. I think a fair amount of this stuff was covered in
some discussions on the list in the past.
> What VGA graphics modes will this card support? Will the "planar"
> mode(s?) be supported (whatever that means)? What about "programming
> tricks" to get "704×528, 736×552, 768×576, and even 800×600"?
If we support some of it, we'll probably be able to support all of it.
With the hardware in place, we can improve the microcode over time.
> What about VGA text modes? Will anything besides 80x25 be supported?
I don't see why not.
> Is the 80x25 screen laid out as a linear array of 2000 16-bit words?
> When we talk about VGA compatibility, do we just mean wrt the PC-adapter
> interface or do we also mean wrt the adapter-monitor interface?
We're not talking about signalling HERE. In fact, we're breaking the
link between framebuffer and video. We can run the monitor at any res
we like while the host thinks we're being completely VGA compliant
with the video.
> It seems to me, perhaps due to my own ignorance, that talking about
> "legacy VGA compatibility" is pretty vague. Could we hammer out a list
> of precisely what modes we want to support, maybe with a
> required/desired kind of ranking for each bullet?
For Linux, we need 80x25. For Windows, we need that and 640x480x16,
and maybe 320x400x256.
> > Normally the nanocontroller wouldn't be involved. Usually, PCI is
> > connected (more) directly to the memory system. But we need to make
> > it switchable so that PCI accesses are queued and wait ready for the
> > nanocontroller to process them. We could queue up a good number of
> > writes, while reads would have to be queued singly.
> Could you be more specific about how the PCI is normally connected to
> the memory system? Is there a diagram somewhere showing that
> architecture? Is there any information on the interface between the
> XP10 and the XC3S4000 (e.g., how wide/fast is it, is it buffered, does
> it adhere to some protocol or standard or is it just straight parallel
> I/O, etc.)?
Howard designed that, so I don't be able to give the full details.
But basically, the PCI controller is a state machine that talks PCI
protocol. From that, we identify read and write requests. Write
requests are queued up and sent across the bridge to another queue in
the Xilinx chip. For reads, first, we have a small cache that holds a
line of memory data. If there's a hit, we return that to the bus
straight away. If there's a miss, we send a request for the whole
line over to the Xilinx and wait for the data to come back. While
waiting for the read data, the bus could time out multiple times.
When the data arrives, it becomes a cache hit.
> Any idea what the latency is to access that memory via PCI (i.e., # bus
> clock cycles from the PCI request arriving to it being satisfied)? I'm
> interested in knowing the max/typ/min latency, if possible ("way more
> than 20 nanocontroller cycles" is pretty vague).
I'm sure I'll leave something out, but...
Leaving out the nanocontroller, we have to cross clock domains from
the 33MHz of PCI to the 100MHz (200MHz DDR) of the bridge to the
Xilinx, where it gets queued. Crossing clock domains incurs an
undetermined delay, but it's usually less than 5 cycles in the slowest
clock domain. There are a few cycles to get it across the bridge.
Then we cross again into the 200MHz domain of the memory controller.
There's an arbiter that deals with competing requests from video, PCI,
engine, etc. That imposes a few cycles. The memory controller itself
has a minimum latency of like 10 cycles (at 200MHz), but it could be
much worse if video is keeping it busy or there's a row miss (a
additional 20 cycles at 200MHz). Then the read data has to transition
back to 100MHz through a queue, cross the bridge, go back into the
33MHz domain, and get stored in the cache.
If that sounds awful, don't forget the fact that on a PC, PIO reads
won't be made into bursts, so the fastest we could ever get is about
40 megs/sec. In practice, we get in the ballpark of 5 megs/sec.
I should also mention that PIO reads of the framebuffer are something
that should generally be avoided like the plague anyhow. With DMA,
we'll be able to achieve about the maximum speed of the bus.
As for writes, we do get about 100 megs/sec.
> But won't this be
> affected by the memory controller design? Will a DDR controller be
> written for this project or is there one out there that'll be used
> (e.g., the one on OpenCores or whatever Xilinx provides with CoreGen or
I already developed one, it does 200MHz, and it's checked into SVN.
> There will obviously be multiple memory access sources; how will
> these simultaneous requests be scheduled? How many sources will be
> initiating requests (anything besides PCI and the display driver)?
At the moment, the arbiter is somewhat hacked together, but
ultimately, we'll have an intelligent scheduler that prioritizes
different agents, tries to avoid row misses, and other such things.
> Am I right in assuming that developers hoping to work with OGD1 will
> need a free 64-bit PCI-X slot?
Well, it would work in that slot, but I did all my stuff in a regular
PC at 33MHz. Note that while the 64-bit extension is there and hooked
up to the XP10, our PCI controller doesn't support it. The extension
is there for OTHER experimenters to use. We don't really need it.
> What PCI-X bus speeds are expected to be
> supported (and am I right in understanding it could be 33MHz, 66MHz, or
75MHz was the fastest I could get the PCI target to go in a static
> I suppose this seems a little off-topic, but it is prescient
> in my mind insofar as the bus speed and width will affect how frequently
> memory requests arrive over PCI and how much data each request entails.
> Plus, I'm shopping around for a new MB. ;)
In theory, we can move data faster than the bus will allow. In
practice, PIO reads have a lot of latency.
Timothy Normand Miller
Open Graphics Project
More information about the Open-graphics