[Open-graphics] Mad dash to finish VGA by Jan 7 -- who's with me?
Farhan Mohamed Ali
farhan at cmu.edu
Fri Nov 30 21:03:25 EST 2007
Thanks, that gives a better idea of what has to be done. I am most
interested in the arbiter/scheduler. Sounds like fun.
Timothy Normand Miller wrote:
> On 11/27/07, Farhan Mohamed Ali <farhan at cmu.edu> wrote:
>> I won't have much to do during my break either (December 19-jan 13).
>> Actually i won't have much to do after december 11 i think.
>> Most things verilog are fine with me. Maybe you can give a rough idea
>> about the complexity of each task (which are the trivial ones and which
>> are more substantial)?
>
> Ok, let's see what I can come up with here...
>
>>> - Modify the PCI controller so that it synthesizes for the Lattice chip
>
> The two main issues are IO drivers/receivers and a meta-coding issue.
> In order to make the thing meet timing, I have to trick the
> synthesizer into not optimizing the logic the way it normally wants
> to. You can tell it that the inputs already have 4ns of delay on them
> from the bus all you want, and the P&R will try to take it into
> account, but the synthesizer is stupid and completely ignores all of
> that. So what I had to do was define a whole bunch of multiplexer
> modules that I route things through, and then I tell the synthesizer
> not to optimize through those modules. This forces the logic to take
> the shape I want. The way I did this was put meta-comments into the
> Verilog code that ISE understands.
>
> There are two porting options for this. One is to figure out what the
> meta comments are for Synplicity (or whatever synthesizer we use for
> the Lattice part). That would be the easiest. The other would be to
> manually instantiate multiplexers in the same way that you can
> manually instantiate a block RAM or a multiplier block. Supposedly
> the synthesizer will see this as a boundary to optimization and not
> optimize across it.
>
> So this involves some research.
>
>>> - Build the two halves of the bridge logic to carry memory/reg
>>> accesses to the Xilinx
>
> For various reasons, we may have to write a new bridge from scratch.
> We'll see. We just need to define a protocol that describes how
> requests are encoded to travel across the bridge. They're all memory
> accesses. With writes, you get an address and a sequence of data.
> With reads, you specify an address and a count, and then data comes
> back later. If multiple things can go on at the same time, then you
> have to arbitrate them (although I think we'll just prevent that).
>
>>> - Glue PCI to the bridge and the SPI PROM controller
>
> This is just address decode logic. A certain address relative to a
> certain BAR is hit, and we need to decide whom to talk to.
>
>>> - Design logic for the Xilinx to process "engine" register accesses
>>> (so we can configure the memory controller, video controller, etc.)
>
> This is just another block of address decode.
>
>>> - Design an arbiter that manages competing memory accesses between PCI
>>> (bridge) and video.
>
> This is a scheduler. The module would have multiple ports for each
> agent wanting to talk to memory. A reader or a writer is an agent, so
> if something wanted to do both, we're best off as treating as two
> agents. The last interface on the module connects to the memory
> controllers.
>
> Those requests come in according to fifo protocol, and we need to have
> some kind of priority selection to choose whom to pay attention to.
> The way I've done this before is to have a 1-hot "you are allowed"
> encoding. If someone is allowed, their requests are paid attention to
> and forwarded to the memory controller(s). If not, they're blocked
> (fifo full signal asserted). If a higher priority request comes in,
> the scheduler can take a few cycles to make the decision and alter the
> allowed registers.
>
> A writer agent is one fifo, where each word is an address with data.
> A writer is a pair of fifos, one that takes addresses for requests,
> and the other returns the data. The memory controller allows tags to
> be passed through so you can tell whose data is coming out so you know
> whose return fifo to write to.
>
> One challenge with this is making sure that return fifos don't fill.
> If you request a read, the memory controller just processes it and
> returns the data. If you don't take the data, it's gone. You need to
> make sure that the number of requests in the request fifo plus the
> number that can be outstanding in the memory controller pipeline does
> not exceed the number of free entries in the return fifo.
>
> Also, a really smart scheduler will take into consideration things
> like memory row misses. If you have more than one agent with the same
> priority, as long as the agent you're paying attention stays on the
> same memory row, you want to stick to it. If it's going to cause a
> row miss (use some heurstic like assume a row hit as long as the lower
> 4 bits of the row portion of the address stays the same), you might
> want to see if some other agent wants to access that same row. This
> way, you magically save 20 cycles of delay. It's debatable, however,
> how much that wins you because access patterns can be anywhere from
> well-behaved and linear (where row misses can be ignored at the high
> level), or completely random, where the chances of two agents wanting
> the same row are so low that you might as well just assume they don't
> and don't bother with the extra logic.
>
> Finally, consider the rate at which an agent makes requests. The
> video controller, for example, should have the highest priority.
> However, if its request generator is in the video clock domain,
> requests may come in slower than the memory controller would process
> them. If you switch immediately to video, this could cause excessive
> row misses due to excessive switching. Instead, you might want to
> wait until that agent has made some minimum number of requests before
> switching.
>
>>> - Design top levels modules for both FPGAs and wrap with pad rings.
>
> The pad ring is a module that breaks out individual pins. Some I/O
> buffers may be instantiated here (if not done already at an inner
> module and they're not inferred correctly). Also, this is where you
> put clock generators. Pins that constitute busses are grouped
> together and connected to the multi-bit ports on the top-level module.
>
> This one is more tedious than anything else. We can provide at least
> a partial pad ring.
>
>>> Phase 2: Installing HQ
>>>
>>> - Design I/O interfaces for HQ that it would use to get access to the
>>> bridge and intercept PCI transactions.
>
> Some discussions have to happen before this, but basically, some fifos
> tie into the MEM stage of HQ and connect to other things in the chip.
> I described this at length in some earlier emails which we'll have to
> dig up. This would also include other control registers, VGA
> I/O-space registers, etc.
>
>>> - Insert HQ into the XP10
>>> - Develop test code for HQ and run it both in simulation and in a real device
>
> Patch it into the glue logic in some sensible way. We'll know how to
> do this when the glue logic sans HQ is working.
>
>>> Phase 3: BIOS ROM
>>>
>>> - Get basic BIOS code together. This mostly involves finding out the
>>> format and putting together a skeleton. Without HQ, we can have it do
>>> something simple like program memory and video controllers.
>
> Research. Find web pages and books that document this and write some
> assembly code.
>
>>> - Get started on VGA BIOS code
>>>
>>> Phase 4: VGA
>>>
>>> - Complete the nanocode for HQ that emulates at least CGA 80x25 text mode.
>
> Understand what process has to be performed and write the code to do
> it. Note that HQ is single-threaded, so in the middle of doing
> translations, it needs to have explicit subroutine calls sprinkled
> about that will look at request fifos from PCI and process them. I've
> also described this in detail before.
>
>
> Did I miss anything?
>
>
More information about the Open-graphics
mailing list