From theosib at gmail.com Tue Jul 1 14:26:14 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Tue Jul 1 14:29:07 2008 Subject: [Open-graphics] HELP: Need DDX and Linux-based diagnostic software for OGD1 Message-ID: <9871ee5f0807011126r5448206qe3ecdc3f710a3c2a@mail.gmail.com> I could really use some help with some software development. The first, most critical thing is diagnostic software. When we plug an OGD1 for the first time into a PC, we need software that lets us bang directly on the hardware, sets up video, etc. I've done this before, but I can't release the code (not soon enough anyhow). Basically, you'll need libpci to start with. Use it to scan for the device, map it, enable the mapping in the hardware, etc. That mapping would be one submodule of the diag suite. Others would include video setup (mostly based on Patrick's code), memory setup (grab the numbers from the simulation code), and other minor things. Oh, and let's not forget programming the SPI prom that loads the S3. The second thing we need is a DDX for X.org. Paul Brooks wrote the one we used at OSCON, so I hoping he can be persuaded to modify this one for the new memory map. :) Otherwise, I'll need someone else to do it. I expect it won't be hard. Thanks! -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From theosib at gmail.com Tue Jul 1 21:03:28 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Tue Jul 1 21:06:18 2008 Subject: [Open-graphics] Version numbers? Message-ID: <9871ee5f0807011803o590a7e11j53903d50f6af2cc7@mail.gmail.com> When this phase one of the VGA project is finished, we can make a release with a version number. At the moment, I think we're actually at "release candidate" status. Various fixes will bump the rc number until we decide that it works well in hardware. This version won't do VGA, but it will work as a dumb framebuffer. Then phase II will include HQ and actually be able to do VGA. So, how should we number those two milestones? -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From andre.pouliot at gmail.com Tue Jul 1 21:47:55 2008 From: andre.pouliot at gmail.com (=?ISO-8859-1?Q?Andr=E9_Pouliot?=) Date: Tue Jul 1 21:51:30 2008 Subject: [Open-graphics] Version numbers? In-Reply-To: <9871ee5f0807011803o590a7e11j53903d50f6af2cc7@mail.gmail.com> References: <9871ee5f0807011803o590a7e11j53903d50f6af2cc7@mail.gmail.com> Message-ID: We can go with something like : 0.1-fb-RC1 and the final release code name could be : Hatching howl the working release could be 0.1 if we need to make correction to that release to fix bug just go for 0.1.x version. for the VGA with something going the same way for number: 0.2-vga-RC1 and the final release code name : Jumping Finch Timothy Normand Miller wrote: > When this phase one of the VGA project is finished, we can make a > release with a version number. At the moment, I think we're actually > at "release candidate" status. Various fixes will bump the rc number > until we decide that it works well in hardware. This version won't do > VGA, but it will work as a dumb framebuffer. > > Then phase II will include HQ and actually be able to do VGA. > > So, how should we number those two milestones? > > > From theosib at gmail.com Tue Jul 8 16:24:58 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Tue Jul 8 16:28:25 2008 Subject: [Open-graphics] Icarus "don't simulate" directive? Message-ID: <9871ee5f0807081324u2ef0a26r6deb9d9673be165c@mail.gmail.com> For Verilog synthesizers, I can do something like this to tell it to not synthesize a bit of code: // synthesis translate_off assign GSR = reset_pin_; // synthesis translate_on Is there an equivalent thing for Icarus that will tell it "don't simulate this"? Thanks. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From ViktorPracht at gmx.de Wed Jul 9 05:28:27 2008 From: ViktorPracht at gmx.de (Viktor Pracht) Date: Wed Jul 9 05:31:55 2008 Subject: [Open-graphics] Icarus "don't simulate" directive? In-Reply-To: <9871ee5f0807081324u2ef0a26r6deb9d9673be165c@mail.gmail.com> References: <9871ee5f0807081324u2ef0a26r6deb9d9673be165c@mail.gmail.com> Message-ID: <20080709092827.151890@gmx.net> Timothy Normand Miller wrote: > For Verilog synthesizers, I can do something like this to tell it to > not synthesize a bit of code: > > // synthesis translate_off > assign GSR = reset_pin_; > // synthesis translate_on > > Is there an equivalent thing for Icarus that will tell it "don't simulate > this"? I didn't try it out, but a simple `ifndef SOMETHING ... `endif in the source and -DSOMETHING on the command line should work. - Viktor Pracht -- Psssst! Schon das coole Video vom GMX MultiMessenger gesehen? Der Eine f?r Alle: http://www.gmx.net/de/go/messenger03 From attila at kinali.ch Sun Jul 13 12:34:15 2008 From: attila at kinali.ch (Attila Kinali) Date: Sun Jul 13 12:38:13 2008 Subject: [Open-graphics] HELP: Need DDX and Linux-based diagnostic software for OGD1 In-Reply-To: <9871ee5f0807011126r5448206qe3ecdc3f710a3c2a@mail.gmail.com> References: <9871ee5f0807011126r5448206qe3ecdc3f710a3c2a@mail.gmail.com> Message-ID: <20080713183415.a23cf558.attila@kinali.ch> On Tue, 1 Jul 2008 14:26:14 -0400 "Timothy Normand Miller" wrote: > The first, most critical thing is diagnostic software. When we plug > an OGD1 for the first time into a PC, we need software that lets us > bang directly on the hardware, sets up video, etc. I've done this > before, but I can't release the code (not soon enough anyhow). > Basically, you'll need libpci to start with. Use it to scan for the > device, map it, enable the mapping in the hardware, etc. That mapping > would be one submodule of the diag suite. Others would include video > setup (mostly based on Patrick's code), memory setup (grab the numbers > from the simulation code), and other minor things. Oh, and let's not > forget programming the SPI prom that loads the S3. Can you be a little bit more specific here what you need? Ie what device are we scanning for? What values should we bang to which registers? Do you want to have an application that just writes some fixed values or do you want to have it defined on the command line? Attila Kinali -- The true CS students do not need to know how to program. They learn how to abstract the process of programming to the point of making programmers obsolete. -- Jabber in #holo From pkk at spth.de Sun Jul 13 12:43:00 2008 From: pkk at spth.de (Philipp Klaus Krause) Date: Sun Jul 13 12:47:30 2008 Subject: [Open-graphics] Re: S3 meets timing In-Reply-To: <9871ee5f0806292121k7d2d59bdmfa6bfcdd9f27fc11@mail.gmail.com> References: <9871ee5f0806292113y1dedab38t962a1c50dd6f05e@mail.gmail.com> <9871ee5f0806292121k7d2d59bdmfa6bfcdd9f27fc11@mail.gmail.com> Message-ID: <487A3094.4080109@spth.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Timothy Normand Miller schrieb: > Even better. It meets timing with the memory running at 200MHz. > > On Mon, Jun 30, 2008 at 12:13 AM, Timothy Normand Miller > wrote: >> [...] Using the global reset, I went >> from banging my head against timing constraints I couldn't meet to >> beating them by a sizable margin. Next up... checking it all in, then >> doing some regression testing in simulation. >> >> Now, if we can just get the XP10 clock skew problem worked out, I'll >> be able to test this in real hardware. Cool. It's nice to see this project reach milestones on the way to a free graphics card. Philipp -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkh6MJQACgkQbtUV+xsoLpplLwCeNUBZbkvt3bLdy/mS/MaMRci4 w7QAnjBpIvTjAu3oAhlqyhPAZV9g2Gu5 =C+CN -----END PGP SIGNATURE----- From theosib at gmail.com Sun Jul 13 13:12:55 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Sun Jul 13 13:16:40 2008 Subject: [Open-graphics] HELP: Need DDX and Linux-based diagnostic software for OGD1 In-Reply-To: <20080713183415.a23cf558.attila@kinali.ch> References: <9871ee5f0807011126r5448206qe3ecdc3f710a3c2a@mail.gmail.com> <20080713183415.a23cf558.attila@kinali.ch> Message-ID: <9871ee5f0807131012j458f92f2p71dc143907807462@mail.gmail.com> On Sun, Jul 13, 2008 at 12:34 PM, Attila Kinali wrote: > On Tue, 1 Jul 2008 14:26:14 -0400 > "Timothy Normand Miller" wrote: > >> The first, most critical thing is diagnostic software. When we plug >> an OGD1 for the first time into a PC, we need software that lets us >> bang directly on the hardware, sets up video, etc. I've done this >> before, but I can't release the code (not soon enough anyhow). >> Basically, you'll need libpci to start with. Use it to scan for the >> device, map it, enable the mapping in the hardware, etc. That mapping >> would be one submodule of the diag suite. Others would include video >> setup (mostly based on Patrick's code), memory setup (grab the numbers >> from the simulation code), and other minor things. Oh, and let's not >> forget programming the SPI prom that loads the S3. > > Can you be a little bit more specific here what you need? > Ie what device are we scanning for? What values should we > bang to which registers? Do you want to have an application > that just writes some fixed values or do you want to have > it defined on the command line? Using libpci, first, you need to get the device list so that you can iterate through it. struct pci_access *pacc; struct pci_dev *dev; pacc = pci_alloc(); /* Get the pci_access structure */ /* Set all options you want -- here we stick with the defaults */ pci_init(pacc); /* Initialize the PCI library */ pci_scan_bus(pacc); /* We want to get the list of devices */ We'll probably want to have some sort of config file where we can list vendor and subvendor IDs that will be searched for. Then we iterate through the list of devices until we find what we're after. for(dev=pacc->devices; dev; dev=dev->next) /* Iterate over all devices */ { pci_fill_info(dev, PCI_FILL_IDENT | PCI_FILL_BASES | PCI_FILL_SIZES | PCI_FILL_ROM_BASE); for (i=0; ibus, dev->dev, dev->func); } } } Now, once you have identified the device, we need to map it into memory Among other things, we need to enable the memory space in the hardware (without a kernel driver, Linux leaves it off): // Write to PCI cfg space to turn on memory decoding c = pci_read_byte(dev, PCI_COMMAND); if (!(c & PCI_COMMAND_MEMORY)) { pci_write_byte(dev, PCI_COMMAND, c | 7); } And we need to open the memory device: if ((mem_fd = open("/dev/mem", O_RDWR)) < 0) { printf("can't open /dev/mem\n"); exit(-1); } Finally, we can map the engine and main memory, like this: base0_ptr = (unsigned char *)mmap( (caddr_t)0, dev->size[0], PROT_READ | PROT_WRITE, MAP_SHARED, mem_fd, dev->base_addr[0]); if (base0_ptr == MAP_FAILED) { perror("mmap error"); exit(-1); } base0_size = dev->size[0]; printf("Mapped phy=0x%08x, size=0x%x, virt=0x%08llx\n", dev->base_addr[0], base0_size, base0_ptr); We need to map engine (BAR0), memory (BAR1), and PROM. This all should go into some library/module that we can link to. For accessing registers, we want some macros. Here's something like a write macro: #define write32(base,offset,data) \ (*((volatile unsigned int *) ((char *)base + offset))) = data; The read macro is basically the same thing as a RHS. Then we can do: #define DDC0_SCK 0x??? ... write32(engine_base, DDC0_SCK, 1); We'll need functions that perform various tasks on the hardware. Memory setup is basically just a bunch of write32's with some delays (10ms here, 100ms there). Video setup involves calling some of Patrick's code that we have in SVN that generates video programs. Then we need a memory test module. Something that will take any pointer and buffer size and walk through it checking integrity. Walking 1's, random address/random data. If you use the right kind of random number generator, you can get a sequence that doesn't repeat, so you can write every address in random order with random data, then reset the random seed and walk back through, reading and checking that it got stored correctly. Then we need a test pattern module. Based on something else having set up video, we could use another module that fills the screen with a regular, recognizable pattern that would make sense to us on the monitor. For instance, a big box with an X through it, some color bars, alternating black/white vertical or horizontal lines, etc. If done right, it would be just as easy to build a monolithic test suite as it would be to build separate programs that can be run in sequence in a script. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From theosib at gmail.com Wed Jul 16 12:32:26 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Wed Jul 16 12:36:26 2008 Subject: [Open-graphics] OGP Status: OGA1 in OGD1, hardware programmed and recognized under Linux Message-ID: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> Yesterday, Howard programmed the XP10 on an OGD1 prototype board, put it into a Linux PC, powered it on, and didn't see smoke. Even better, lspci reports that that the card is recognized as a VGA capable device. Although I had done lots of RTL-level simulation, that PCI core has never before been tested in hardware, so it's pretty exciting to see it handle config space correctly. Howard is working on diagnostic code right now, what I was talking about earlier. This code will enable memory decode, map the memory and register space, and perform initialization. Once we have verified that the XP10's internal memory-mapped registers are accessible, and we can talk to its peripherals, we can program the Xilinx chip and get it up and running, initialize memory, perform memory tests, start up video, display a picture, etc. Unfortunately, he's copying some code that we can't share, so we'll still need someone to eventually write a Free version of this. That pushes the VGA BIOS and the HQ microcode up in priority. The next tasks, things we really need help with, include: - HQ microcode. In the right mode, all PCI traffic to graphics memory is routed through this tiny microprocessor. It's primary purpose will be to work in the background, converting the VGA text display into raw pixels that can be scanned out by our video controller. A lot of information about VGA is on our wiki, but it may not be complete, so we could use some help with that. Personally, unless I'm tied up with some other hardware issue, I'd be happy to write the microcontroller code myself. The hard part, really, is just knowing what to DO, so if a few people could become experts on the details of things like the VGA IO space registers and other stuff like that, it would help me tremendously. - VGA BIOS. This should be something totally minimal. Just enough to set up video, program the HQ microcontroller, and get VGA text going. There's a well-documented table that goes at the head of the PROM. I need some help with laying out some assembly code for that table and the minimal VGA int10 stuff. That should go on in parallel to the HQ work. - The DDX. At first, we don't need a kernel driver. Paul Brooks wrote a DDX for OSCON, so it just needs to be hacked to work with the register layout of this card. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Fri Jul 18 13:31:32 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Fri Jul 18 13:35:48 2008 Subject: [Open-graphics] OGP Status: OGA1 in OGD1, hardware programmed and recognized under Linux In-Reply-To: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> Message-ID: <20080718173132.GA6004@eideticdew.org> On 2008-07-16, Timothy Normand Miller wrote: > [...] That > pushes the VGA BIOS and the HQ microcode up in priority. I believe HQ has not yet been connected to the bridge? > The next tasks, things we really need help with, include: > > - HQ microcode. When we start on the microcode it may be an idea to come up with an (manually enforced) ABI for utilise the registers as best as we can without ending up with a web of dependencies to rework if we need to change something. Is there a register-based ABI/practices we can adapt? Otherwise, I can write up some ideas. From theosib at gmail.com Fri Jul 18 14:23:36 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Fri Jul 18 14:27:47 2008 Subject: [Open-graphics] OGP Status: OGA1 in OGD1, hardware programmed and recognized under Linux In-Reply-To: <20080718173132.GA6004@eideticdew.org> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> Message-ID: <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> On Fri, Jul 18, 2008 at 1:31 PM, Petter Urkedal wrote: > On 2008-07-16, Timothy Normand Miller wrote: >> [...] That >> pushes the VGA BIOS and the HQ microcode up in priority. > > I believe HQ has not yet been connected to the bridge? No. Not yet. We're going to get to that once we have debugged it to this point. However, that doesn't mean we can't start on the coding and them merge things later. Would you like to work with me to architect this? > >> The next tasks, things we really need help with, include: >> >> - HQ microcode. > > When we start on the microcode it may be an idea to come up with an > (manually enforced) ABI for utilise the registers as best as we can > without ending up with a web of dependencies to rework if we need to > change something. Is there a register-based ABI/practices we can adapt? > Otherwise, I can write up some ideas. Are you referring to the assignment of "names" to scratch space addresses? We definitely need something like that, but it may be very program-dependent. Unless things turn out to be surprisingly small, we'll have one program for VGA text, one for VGA graphics, and eventually, one for DMA. BIOS or kernel can reload the program as necessary. I'm always in favor of creating good design structures. 512 program words doesn't seem like a lot, but that's part of the challenge -- fitting a program into that space. To keep our sanity, we really need to be organized about it. This is especially important for us and our progeny to be able to maintain it later. Unfortunately, I'm not sure what pre-existing paradigms might apply here, so lets develop something new. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Sat Jul 19 08:20:21 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Sat Jul 19 08:24:38 2008 Subject: [Open-graphics] OGP Status: OGA1 in OGD1, hardware programmed and recognized under Linux In-Reply-To: <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> Message-ID: <20080719122021.GA1459@eideticdew.org> On 2008-07-18, Timothy Normand Miller wrote: > On Fri, Jul 18, 2008 at 1:31 PM, Petter Urkedal wrote: > > I believe HQ has not yet been connected to the bridge? > > No. Not yet. We're going to get to that once we have debugged it to > this point. However, that doesn't mean we can't start on the coding > and them merge things later. Would you like to work with me to > architect this? I'm not sure how this will be done, but I can probably help anyway. * As I recall from previous discussion, we want to decode the PCI address and dispatch to HQ in hardware, rather than equipping HQ with a control bit to intercept all incoming PCI commands. Can we assume the BAR for HQ is fixed, or shall HQ be able to configure the pipe from PCI to intercept different address ranges on demand? * As far as I can see, there are four clocks involved, the PCI clock, two separate clocks for bridge transmission and reception, and the HQ clock. Are all these different? I assume memory access goes though the bridge. So, we must extend xp10_bridge_wrapper.v with an additional internal interface for HQ memory operations. If we need high thoughput, is there any alternative to two extra FIFOs using two BRAMs? Any yet another two for PCI? Since HQs BRAM has an unused port with it's own clock domain, it may be possible to let the bridge read and write data directly to HQ memory. That is, for memory-write, HQ prepares the data in a subrange of it's BRAM, and tells the bridge to transmit the range to a given memory address. For memory-read, HQ tells the bridge to transfer a memory range to a BRAM range. That could also work for PCI, though we'd need to extend HQ internal memory with another BRAM due to the separate clock domains. > >> The next tasks, things we really need help with, include: > >> > >> - HQ microcode. > > > > When we start on the microcode it may be an idea to come up with an > > (manually enforced) ABI for utilise the registers as best as we can > > without ending up with a web of dependencies to rework if we need to > > change something. Is there a register-based ABI/practices we can adapt? > > Otherwise, I can write up some ideas. > > Are you referring to the assignment of "names" to scratch space > addresses? That's an issue, too. For the moment, I was just considering parameters, results, and scratch registers for subroutine calls. Since our programs are small, it's probably not a big issue. It may suffice with a single level of calls to rather simple subroutines using only a few registers, and once written the register usage of the subroutine is unlikely to change. > We definitely need something like that, but it may be very > program-dependent. Unless things turn out to be surprisingly small, > we'll have one program for VGA text, one for VGA graphics, and > eventually, one for DMA. BIOS or kernel can reload the program as > necessary. > > I'm always in favor of creating good design structures. 512 program > words doesn't seem like a lot, but that's part of the challenge -- > fitting a program into that space. To keep our sanity, we really need > to be organized about it. This is especially important for us and our > progeny to be able to maintain it later. Unfortunately, I'm not sure > what pre-existing paradigms might apply here, so lets develop > something new. This is what I had in mind. We allocate from r0 upwards in the following order with possible overlaps (s = scratch, r = read, w = write) caller callee 1. Scratch registers. s s 2. Result registers. s/r s/w 3. Scratched parameter registers. s/w s/r 4. Preserved parameter registers. s/w r 5. Continuation address register. s/w r 6. Callee preserved registers. s - 7. Caller preserved registers. - - Relating to stack-based ABI, regs 2 to 5 makes up the current frame, and higher registers are higher stack frames. Relating to CPS-based ABI, regs 4 to 7 are the continuation and regs 2 are the parameters passed to the continuation. E.g. "z += x*y" would be (cf hqlib/mulu.asm though it doesn't use this convention), r0 - parameter and result z r1 - parameter x r3 - parameter y r4 - continuation address r5..r31 - preserved If we had a subroutine using the above, it's usage may be r0..r4 - scratch r5..r[N-1] - output, input, cont r[N]..r31 - preserved This facilitates bottom-up coding. As long as we are dealing with simple and well-defined subroutines, we'll be able to allocate registers precisely. For higher level subroutines, we can think forward and set a side some extra scratch registers. From theosib at gmail.com Sat Jul 19 16:55:34 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Sat Jul 19 16:59:50 2008 Subject: [Open-graphics] OGP Status: OGA1 in OGD1, hardware programmed and recognized under Linux In-Reply-To: <20080719122021.GA1459@eideticdew.org> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> Message-ID: <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> On Sat, Jul 19, 2008 at 8:20 AM, Petter Urkedal wrote: > On 2008-07-18, Timothy Normand Miller wrote: >> On Fri, Jul 18, 2008 at 1:31 PM, Petter Urkedal wrote: >> > I believe HQ has not yet been connected to the bridge? >> >> No. Not yet. We're going to get to that once we have debugged it to >> this point. However, that doesn't mean we can't start on the coding >> and them merge things later. Would you like to work with me to >> architect this? > > I'm not sure how this will be done, but I can probably help anyway. > > * As I recall from previous discussion, we want to decode the PCI > address and dispatch to HQ in hardware, rather than equipping HQ > with a control bit to intercept all incoming PCI commands. Can we > assume the BAR for HQ is fixed, or shall HQ be able to configure the > pipe from PCI to intercept different address ranges on demand? The way the bridge works, there can only be one outstanding read transaction (for however many words are requested). Once the request has been made, the bridge switches the IO buffers so that the data lines can only be used for read data until the transaction is finished. We can later consider bypass and through accesses to memory, but for the moment, it's simpler to just make it all or nothing. This applies to both memory and register access, since they both go through the same bridge. We need to consider the consequences of having HQ intercept EVERY access to the bridge. > * As far as I can see, there are four clocks involved, the PCI clock, > two separate clocks for bridge transmission and reception, and the > HQ clock. Are all these different? There are 3 or 4. Here's how it's arranged right now, without HQ, where the fifos exist to cross from the PCI clock domain to the bridge clock domain: bridge write data and commands (PCI clock) ---> bridge write clock domain bridge read clock domain ---> bridge read return data (PCI) The bridge is constrained to run at 100MHz, although it's running at 90 right now (didn't bother to change the clock generator numbers). I wouldn't expect HQ to run a whole lot faster, so it should be okay to run HQ and the bridge at the same speed. We can revisit that decision later. Adding in HQ, some things change. The command queue (write data and read requests) can simply be reconnected to HQ (but with a bypass too, of course). A read return queue from HQ to PCI only actually necessary because we're crossing clock domains; otherwise, it's waste. HQ writes (and read commands) to the bridge could be direct, but the bridge can go busy, for instance if video is making a long read, and queues in the S3 fill, so we should have a queue there, where we can read its fullness in software and push or not push writes (and read commands). Read data from the bridge to HQ has to be queued so that HQ can request reads, go off and do something else, then come back and read the data (or wait for it anyhow). So that leaves us with four queues: PCI write/cmd ---> HQ HQ write/cmd ---> bridge write bridge read data ---> HQ HQ read data --> PCI With hop-overs when HQ is disabled. > I assume memory access goes though the bridge. So, we must extend > xp10_bridge_wrapper.v with an additional internal interface for HQ > memory operations. If we need high thoughput, is there any alternative > to two extra FIFOs using two BRAMs? Any yet another two for PCI? Because of the combination of need for being asynchronous and to cross clock domains, I can't see an alternative. These can just be 16-entry distributed-RAM fifos. > Since HQs BRAM has an unused port with it's own clock domain, it may be > possible to let the bridge read and write data directly to HQ memory. > That is, for memory-write, HQ prepares the data in a subrange of it's > BRAM, and tells the bridge to transmit the range to a given memory > address. For memory-read, HQ tells the bridge to transfer a memory > range to a BRAM range. That could also work for PCI, though we'd need > to extend HQ internal memory with another BRAM due to the separate clock > domains. That could be very useful, for performance and more asynchrony. However, the bypass won't work that way, so we'd have to implement both mechanisms. We should start with the dumber one that works with bypass and see if we can really benefit from the optimization afterwards. BTW, there are some facts about the bus protocol that we might want to change. When accessing the bridge, the first cycle is the address, and the flag bits indicate the target (memory or config registers). For reads, the subsequent cycle is the word count, after which the bus switches direction and waits. For writes, subsequent cycles are data, flags indicate which bytes are valid, and the address auto-increments. The address counter in the S3 auto-increments, but it only increments the lower 7 bits of the word address. So every 128 32-bit words, it's required that a new address be sent. That happens automatically with PCI due to the way this target is designed, but HQ will have to enforce it in the program. One thing we may want to change is how the target flags are presented. Right now, they're separate from the address, but the address isn't 32 bits, so they could be prepended. However, it may actually be faster to make them separate, potentially saving some HQ code to extract them. >> >> The next tasks, things we really need help with, include: >> >> >> >> - HQ microcode. >> > >> > When we start on the microcode it may be an idea to come up with an >> > (manually enforced) ABI for utilise the registers as best as we can >> > without ending up with a web of dependencies to rework if we need to >> > change something. Is there a register-based ABI/practices we can adapt? >> > Otherwise, I can write up some ideas. >> >> Are you referring to the assignment of "names" to scratch space >> addresses? > > That's an issue, too. For the moment, I was just considering > parameters, results, and scratch registers for subroutine calls. Since > our programs are small, it's probably not a big issue. It may suffice > with a single level of calls to rather simple subroutines using only a > few registers, and once written the register usage of the subroutine is > unlikely to change. We're extremely constrained, so we need to pick something that's very efficient, even it's more challenging to program. We could think like fortran 77, where parameters are passed by reference and live at fixed addresses. So if you call function X, then you know exactly where in scratch space to dump its parameters. Exactly what to do with the return/continuation address is a question, but if we do it this way, there's a place in scratch memory to put it (if the subroutine needs the register for something else). Not having a stack does pose some challenges. We can simulate a stack, but the instruction overhead could hurt performance. We could also consider implementing a 16-entry stack, although I don't want to if we can avoid it. If we were ever to take this basic architecture and scale it to a more powerful processor design in the future, it would not be binary compatible. Moreover, different revisions of OGA do not have to have binary-compatible HQs. As long as the instruction sets are well-documented, the system and BIOS can handle figuring out which code file to load for any given purpose. >> We definitely need something like that, but it may be very >> program-dependent. Unless things turn out to be surprisingly small, >> we'll have one program for VGA text, one for VGA graphics, and >> eventually, one for DMA. BIOS or kernel can reload the program as >> necessary. >> >> I'm always in favor of creating good design structures. 512 program >> words doesn't seem like a lot, but that's part of the challenge -- >> fitting a program into that space. To keep our sanity, we really need >> to be organized about it. This is especially important for us and our >> progeny to be able to maintain it later. Unfortunately, I'm not sure >> what pre-existing paradigms might apply here, so lets develop >> something new. > > This is what I had in mind. We allocate from r0 upwards in the > following order with possible overlaps (s = scratch, r = read, w = > write) > > caller callee > 1. Scratch registers. s s > 2. Result registers. s/r s/w > 3. Scratched parameter registers. s/w s/r > 4. Preserved parameter registers. s/w r > 5. Continuation address register. s/w r > 6. Callee preserved registers. s - > 7. Caller preserved registers. - - > > Relating to stack-based ABI, regs 2 to 5 makes up the current frame, and > higher registers are higher stack frames. Relating to CPS-based ABI, > regs 4 to 7 are the continuation and regs 2 are the parameters passed to > the continuation. Continuations are interesting for high-level programming, but I'm not even thinking of doing a proper stack-based ABI. A simple approach: Every function has access to every register for any purpose, but for any register it's going to use, it is responsible for storing it to a fixed (computed by the assembler) location in scratch space, and restoring it before returning. This includes the call return address. We can devise assembler directives that indicate which registers hold which parameters (no scratch space needed), which registers are for local variables (requiring a backing store in scratch space), and how many additional scratch locations are necessary for the task to compute. Parameters can be passed in registers, and we are free to restrict it so that no more than some number of parameters can be passed, owing to certain registers being reserved (particularly the return address). Recursion is not allowed. Indeed, disallowing recursion may help us simplify things further. It's slightly faster to use registers than scratch space, so we should see how we can optimize to use registers more. > E.g. "z += x*y" would be (cf hqlib/mulu.asm though it doesn't use this > convention), > > r0 - parameter and result z > r1 - parameter x > r3 - parameter y > r4 - continuation address > r5..r31 - preserved > > If we had a subroutine using the above, it's usage may be > > r0..r4 - scratch > r5..r[N-1] - output, input, cont > r[N]..r31 - preserved > > This facilitates bottom-up coding. As long as we are dealing with > simple and well-defined subroutines, we'll be able to allocate registers > precisely. For higher level subroutines, we can think forward and set a > side some extra scratch registers. I like this. We can put some intelligence into the assembler (or compiler?) that allows a function to state which registers it's going to use. Then the CALLER is responsible for saving and restoring, and this makes room for the caller to simply avoid certain registers, rather than the callee having to save and restore registers even if nothing useful is in them. So for leaf nodes and their parents, this can result in some significant optimizations, although that will diminish dramatically for the next level up. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Sun Jul 20 06:59:38 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Sun Jul 20 07:04:03 2008 Subject: [Open-graphics] HQ assembler and coding conventions In-Reply-To: <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> Message-ID: <20080720105938.GA5510@eideticdew.org> On 2008-07-19, Timothy Normand Miller wrote: > >> > When we start on the microcode it may be an idea to come up with an > >> > (manually enforced) ABI for utilise the registers as best as we can > >> > without ending up with a web of dependencies to rework if we need to > >> > change something. Is there a register-based ABI/practices we can adapt? > >> > Otherwise, I can write up some ideas. > >> > >> Are you referring to the assignment of "names" to scratch space > >> addresses? > > > > That's an issue, too. For the moment, I was just considering > > parameters, results, and scratch registers for subroutine calls. Since > > our programs are small, it's probably not a big issue. It may suffice > > with a single level of calls to rather simple subroutines using only a > > few registers, and once written the register usage of the subroutine is > > unlikely to change. > > We're extremely constrained, so we need to pick something that's very > efficient, even it's more challenging to program. We could think like > fortran 77, where parameters are passed by reference and live at fixed > addresses. So if you call function X, then you know exactly where in > scratch space to dump its parameters. Exactly what to do with the > return/continuation address is a question, but if we do it this way, > there's a place in scratch memory to put it (if the subroutine needs > the register for something else). Not having a stack does pose some > challenges. We can simulate a stack, but the instruction overhead > could hurt performance. We could also consider implementing a > 16-entry stack, although I don't want to if we can avoid it. With so many registers and so limited program space, I think we can manage the manual register allocation without saving, at least for the inner calls. As you indicate, parameters and results may overlap. Likewise return/continuation address is just a preserved input parameter. I listed these explicitly below only to suggest the ordering. > If we were ever to take this basic architecture and scale it to a more > powerful processor design in the future, it would not be binary > compatible. Moreover, different revisions of OGA do not have to have > binary-compatible HQs. As long as the instruction sets are > well-documented, the system and BIOS can handle figuring out which > code file to load for any given purpose. > > >> We definitely need something like that, but it may be very > >> program-dependent. Unless things turn out to be surprisingly small, > >> we'll have one program for VGA text, one for VGA graphics, and > >> eventually, one for DMA. BIOS or kernel can reload the program as > >> necessary. > >> > >> I'm always in favor of creating good design structures. 512 program > >> words doesn't seem like a lot, but that's part of the challenge -- > >> fitting a program into that space. To keep our sanity, we really need > >> to be organized about it. This is especially important for us and our > >> progeny to be able to maintain it later. Unfortunately, I'm not sure > >> what pre-existing paradigms might apply here, so lets develop > >> something new. > > > > This is what I had in mind. We allocate from r0 upwards in the > > following order with possible overlaps (s = scratch, r = read, w = > > write) > > > > caller callee > > 1. Scratch registers. s s > > 2. Result registers. s/r s/w > > 3. Scratched parameter registers. s/w s/r > > 4. Preserved parameter registers. s/w r > > 5. Continuation address register. s/w r > > 6. Callee preserved registers. s - > > 7. Caller preserved registers. - - > > > > Relating to stack-based ABI, regs 2 to 5 makes up the current frame, and > > higher registers are higher stack frames. Relating to CPS-based ABI, > > regs 4 to 7 are the continuation and regs 2 are the parameters passed to > > the continuation. > > Continuations are interesting for high-level programming, but I'm not > even thinking of doing a proper stack-based ABI. Yes, I'm just drawing the analog to see if the proposal makes sense. Since we have neither a stack nor higher order functions, the analogs both match and mismatch. > A simple approach: > > Every function has access to every register for any purpose, but for > any register it's going to use, it is responsible for storing it to a > fixed (computed by the assembler) location in scratch space, and > restoring it before returning. This includes the call return address. > We can devise assembler directives that indicate which registers hold > which parameters (no scratch space needed), which registers are for > local variables (requiring a backing store in scratch space), and how > many additional scratch locations are necessary for the task to > compute. > Parameters can be passed in registers, and we are free to restrict it > so that no more than some number of parameters can be passed, owing to > certain registers being reserved (particularly the return address). > Recursion is not allowed. Indeed, disallowing recursion may help us > simplify things further. I was hoping we could avoid saving registers in most cases, since we have quite many of them. This also means we don't fix the return address register. I think the assembler directive to indicate register usage is mostly just documentation, but see below. > It's slightly faster to use registers than scratch space, so we should > see how we can optimize to use registers more. Yes. > > E.g. "z += x*y" would be (cf hqlib/mulu.asm though it doesn't use this > > convention), > > > > r0 - parameter and result z > > r1 - parameter x > > r3 - parameter y > > r4 - continuation address > > r5..r31 - preserved > > > > If we had a subroutine using the above, it's usage may be > > > > r0..r4 - scratch > > r5..r[N-1] - output, input, cont > > r[N]..r31 - preserved > > > > This facilitates bottom-up coding. As long as we are dealing with > > simple and well-defined subroutines, we'll be able to allocate registers > > precisely. For higher level subroutines, we can think forward and set a > > side some extra scratch registers. > > I like this. We can put some intelligence into the assembler (or > compiler?) that allows a function to state which registers it's going > to use. Then the CALLER is responsible for saving and restoring, and > this makes room for the caller to simply avoid certain registers, > rather than the callee having to save and restore registers even if > nothing useful is in them. So for leaf nodes and their parents, this > can result in some significant optimizations, although that will > diminish dramatically for the next level up. Yes, I've considered some simple extension to the assembler. The assembler has no concept of subroutine boundaries. A simple extension would be to introduce a register aliasing directive like reg p0..p2 = r5.. so that p-registers are our parameters. There is no point in aliasing the scratch registers, since they always start at r0, and registers above our parameters are preserved by us, so we don't refer to them. It could make sense to alias registers between those scratched by all inner calls and our own parameters: reg q0..q3 = r6.. When calling a subroutine, the subroutine's register definitions will not be in scope, so we still need to refer to physical registers for parameters. We could allow references like label.p0, but that's probably not a big advantage since we need to keep track of which r-registers are scratched by the subroutine, anyway. If we see move ..., r3 move ..., r4 jump func, r5 then the reader of the code knows that r0..r5 will be scratched by the call, whereas "move ..., func.p0" will be less informative. To introduce subroutine-awareness we could add a directive right after the label: some_function: sub(r3..r4; r5) r0..r2 ; (params, cont) scratch ... endsub and in the caller sub(r10..r11; r12) r0..r10 local q0..q3 = r6.. ; assert that no subroutine scratches these ... ;; Here the assembler checks the "sub" directive of ;; "some_function" and fails if it overlaps with r6..r9. call some_function(l3, r1) ... endsub where the call, which must not be in a delay-slot, translates to move l3, r3 move r1, r4 jump some_function, r5 This complicates the assembler, so maybe we should start coding and see if and what we need before adding such an extension. A technicality: The assembler uses a dedicated lexical class for registers, so if we introduce any of the above we must make them distinct from other identifiers. We could * Reserve certain literals for registers, say /[a-z][0-9]+/ or more limited /[p-s][0-9]+/. * Use a prefix as in %r8, %base_addr. From urkedal at nbi.dk Sun Jul 20 08:38:39 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Sun Jul 20 08:42:58 2008 Subject: [Open-graphics] Connecting the HQ In-Reply-To: <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> Message-ID: <20080720123839.GB5510@eideticdew.org> On 2008-07-19, Timothy Normand Miller wrote: > [...] We need to consider the consequences > of having HQ intercept EVERY access to the bridge. So if I understand this correctly, the current plan is full-intercept or no intercept, but it's something we may need to reconsider. I guess for VGA, full-intercept is okay since most data is translated, but if we use HQ in GPU mode, then full-intercept would be a major bottleneck. > Adding in HQ, some things change. The command queue (write data and > read requests) can simply be reconnected to HQ (but with a bypass too, > of course). A read return queue from HQ to PCI only actually > necessary because we're crossing clock domains; otherwise, it's waste. > HQ writes (and read commands) to the bridge could be direct, but the > bridge can go busy, for instance if video is making a long read, and > queues in the S3 fill, so we should have a queue there, where we can > read its fullness in software and push or not push writes (and read > commands). Read data from the bridge to HQ has to be queued so that > HQ can request reads, go off and do something else, then come back and > read the data (or wait for it anyhow). > > So that leaves us with four queues: > > PCI write/cmd ---> HQ > HQ write/cmd ---> bridge write > bridge read data ---> HQ > HQ read data --> PCI > > With hop-overs when HQ is disabled. That looks quite strait forward, so we could go with this for now. If HQ does not run on the bridge clock, would it still be feasible to re-use the PCI FIFOs for HQ. That is, can one end be switched between the clock-domains of HQ and the bridge? PCI <===>|-----------|--> bridge ("<===>" means FIFO, |-> HQ <===>| "|" indicates bypass/intercept switch) But there is no way to do the clock-switch properly, is there? > > I assume memory access goes though the bridge. So, we must extend > > xp10_bridge_wrapper.v with an additional internal interface for HQ > > memory operations. If we need high thoughput, is there any alternative > > to two extra FIFOs using two BRAMs? Any yet another two for PCI? > > Because of the combination of need for being asynchronous and to cross > clock domains, I can't see an alternative. These can just be 16-entry > distributed-RAM fifos. Good, no BRAM needed to pass clock domains. > > Since HQs BRAM has an unused port with it's own clock domain, it may be > > possible to let the bridge read and write data directly to HQ memory. > > That is, for memory-write, HQ prepares the data in a subrange of it's > > BRAM, and tells the bridge to transmit the range to a given memory > > address. For memory-read, HQ tells the bridge to transfer a memory > > range to a BRAM range. That could also work for PCI, though we'd need > > to extend HQ internal memory with another BRAM due to the separate clock > > domains. > > That could be very useful, for performance and more asynchrony. > However, the bypass won't work that way, so we'd have to implement > both mechanisms. We should start with the dumber one that works with > bypass and see if we can really benefit from the optimization > afterwards. So, data always passes though HQs pipes and clock domain even in bypass mode? That solved the clock-switching issue. It adds latency for bypass mode, but it's probably negligible overall. > BTW, there are some facts about the bus protocol that we might want to > change. When accessing the bridge, the first cycle is the address, > and the flag bits indicate the target (memory or config registers). These flag bits sound like a natural extension as the highest bits of the address. > For reads, the subsequent cycle is the word count, after which the bus > switches direction and waits. > > For writes, subsequent cycles are data, flags indicate which bytes are > valid, and the address auto-increments. So, these flags can't be combined with the other data. I guess the common case is that all are 1, so shall we * write an optional byte-enable before write with default 1111, and then it applies to all data, or * add a write-mode where byte-enables and data are interlaced? > The address counter in the S3 auto-increments, but it only increments > the lower 7 bits of the word address. So every 128 32-bit words, it's > required that a new address be sent. That happens automatically with > PCI due to the way this target is designed, but HQ will have to > enforce it in the program. I think we can manage that. > One thing we may want to change is how the target flags are presented. > Right now, they're separate from the address, but the address isn't > 32 bits, so they could be prepended. However, it may actually be > faster to make them separate, potentially saving some HQ code to > extract them. I'm not sure either. If the flags are encoded in the address in such a way that it does not affect the use of the address, and if the common usage for flags is to test them individually, then combining flags and addresses can save register usage and fetch commands. From theosib at gmail.com Sun Jul 20 11:31:44 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Sun Jul 20 11:36:02 2008 Subject: [Open-graphics] Connecting the HQ In-Reply-To: <20080720123839.GB5510@eideticdew.org> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> <20080720123839.GB5510@eideticdew.org> Message-ID: <9871ee5f0807200831h61225655v17cb8d0cb18d895b@mail.gmail.com> On Sun, Jul 20, 2008 at 8:38 AM, Petter Urkedal wrote: > On 2008-07-19, Timothy Normand Miller wrote: >> [...] We need to consider the consequences >> of having HQ intercept EVERY access to the bridge. > > So if I understand this correctly, the current plan is full-intercept or > no intercept, but it's something we may need to reconsider. I guess for > VGA, full-intercept is okay since most data is translated, but if we use > HQ in GPU mode, then full-intercept would be a major bottleneck. Making this selectable by HQ itself could be good, although we'll have to be very careful about race conditions where there are PCI accesses coming through at the same time that HQ makes the switch. Alternatively, we could require the driver to do it. If we want to switch between PIO and DMA, we have to require the driver to switch the bypass on and off. Ideally, in GPU mode, DMA will be used for almost everything. Any PIOs that do happen will have latency, since HQ will have to poll for them and pass them along, but that will have minimal impact. Of course, DMA is for later. > But there is no way to do the clock-switch properly, is there? Not really. Too complicated, requiring so much extra logic that you might as well just add another queue. >> That could be very useful, for performance and more asynchrony. >> However, the bypass won't work that way, so we'd have to implement >> both mechanisms. We should start with the dumber one that works with >> bypass and see if we can really benefit from the optimization >> afterwards. > > So, data always passes though HQs pipes and clock domain even in bypass > mode? That solved the clock-switching issue. It adds latency for > bypass mode, but it's probably negligible overall. It'll be minor compared to the other delays. >> BTW, there are some facts about the bus protocol that we might want to >> change. When accessing the bridge, the first cycle is the address, >> and the flag bits indicate the target (memory or config registers). > > These flag bits sound like a natural extension as the highest bits of > the address. Yeah, so an early change we can make is to move those bits into the address, even before HQ is in. Various things in the XP10 and S3 will have to change for that. >> For reads, the subsequent cycle is the word count, after which the bus >> switches direction and waits. >> >> For writes, subsequent cycles are data, flags indicate which bytes are >> valid, and the address auto-increments. > > So, these flags can't be combined with the other data. I guess the > common case is that all are 1, so shall we > * write an optional byte-enable before write with default 1111, and > then it applies to all data, or > * add a write-mode where byte-enables and data are interlaced? Another option would be to have 15 I/O ports for writes, one for each combination of flags. If you already know the flags (usually 1111), you can hard-code it. Otherwise, you can add the flags to some address. >> The address counter in the S3 auto-increments, but it only increments >> the lower 7 bits of the word address. So every 128 32-bit words, it's >> required that a new address be sent. That happens automatically with >> PCI due to the way this target is designed, but HQ will have to >> enforce it in the program. > > I think we can manage that. It could actually be challenging. A row of characters is 160 bytes, or 40 words. Since that's not an even multiple of 128, the code that requests reads will have to be designed to figure out where to split the request, and in as few instructions as possible. Enough of the way the bridge bus protocol works is mingled into the address decoder that we may have to make some changes to be able to sensibly queue up multiple separate read requests back to back so that HQ can always be able to do something else while waiting on read data. I'll have to go back and look to see what would happen if a command were queued up while in read mode. Right now, that will never happen, since the address decoder is the only thing ever talking to the bridge. We can also consider changes to the bridge protocol. >> One thing we may want to change is how the target flags are presented. >> Right now, they're separate from the address, but the address isn't >> 32 bits, so they could be prepended. However, it may actually be >> faster to make them separate, potentially saving some HQ code to >> extract them. > > I'm not sure either. If the flags are encoded in the address in such a > way that it does not affect the use of the address, and if the common > usage for flags is to test them individually, then combining flags and > addresses can save register usage and fetch commands. This would be easy enough to change even now. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From theosib at gmail.com Sun Jul 20 17:52:39 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Sun Jul 20 17:56:59 2008 Subject: [Open-graphics] HQ assembler and coding conventions In-Reply-To: <20080720105938.GA5510@eideticdew.org> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> <20080720105938.GA5510@eideticdew.org> Message-ID: <9871ee5f0807201452v6c3cf386p7e36b2515fff2283@mail.gmail.com> On Sun, Jul 20, 2008 at 6:59 AM, Petter Urkedal wrote: > This complicates the assembler, so maybe we should start coding and see > if and what we need before adding such an extension. Definitely. We should work it all out in C code first. We can structure a C program with some functions that stand in for the hardware capabilities, and code it to work, actually convert from text and font to pixels. From this, we can decide exactly how complex the assembler needs to be. We can also add more complexity as our needs grow. All of your ideas are good, but we should weigh that effort against the effort of maintaining less organized code. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From theosib at gmail.com Sun Jul 20 19:32:32 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Sun Jul 20 19:36:53 2008 Subject: [Open-graphics] VGA text mode C version Message-ID: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> This is a modification of something I posted ages ago to the list. It is a C version of the code that will run in HQ to do VGA text mode. We should argue about it a bit, then check it into SVN, and start hacking it until it works and is also structured how it needs to be to run in HQ. For instance, all global variables should be offsets into an explicit scratch array with 512 words in it. /* CGA text is 2 bytes per character. The first byte is the glyph number. The second byte is color. I'm doing this from memory, and this is probably wrong, but for this code, I'm going to assume this for the color byte: [3:0] foreground color [6:4] background color [7] blink The colors are as follows (also probably wrong): 0 - black #000000 1 - dk red #AA0000 2 - dk green #00AA00 3 - brown #AAAA00 (definitely wrong, but whatever) 4 - dk blue #0000AA 5 - dk magenta #AA00AA 6 - dk cyan #00AAAA 7 - lt gray #AAAAAA 8 - dk gray #555555 9 - lt red #FF5555 10 - lt green #55FF55 11 - yellow #FFFF55 12 - lt blue #5555FF 13 - lt magenta #FF55FF 14 - lt cyan #55FFFF 15 - white #FFFFFF */ /* Base address of where the text is stored in graphics memory */ int text_base; /* Width and height in characters of the character display */ int text_width, text_height; /* Base address of where the font is stored */ int font_base; /* Font is assumed to be 8x16 (not always correct), where a character is found as: font_addr + glyph_num*16 */ int glyph_height; /* Base address of where pixels are stored for the video framebuffer */ int pixel_base; /* Actual size of glyph in graphics buffer. Different from 8x16 if scaled */ int cell_width, cell_height; /* Size of whole graphics screen */ int screen_width, screen_height; /* This is a pointer into the scratch space where we temporarily store row of text */ int *text_line; /* Poll for PCI accesses to service */ void poll_pci() { while (memory or engine reg writes) { /* get the write and forward it to the bridge */ } if (memory or engine reg read) { /* forward the read addr and count to the bridge */ /* wait for read data to arrive and forward it all back to address_decode */ } while (vga io writes) { /* process them, storing appropriate data in scratch space */ } if (vga io read) { /* fetch data from scratch space and return it */ } } /* Draw a glyph whose bitmap has been queued */ void draw_glyph(int pixel_addr, int fg, int bg) { int px, py, i, bit, g; /* Outer loop for the rows of the glyph */ for (py=0; py>= 1; pixel = bit ? fg : bg; write_io(BRIDGE_WRITE_DATA_1111, pixel); pixel_addr++; } /* Move left and down to the next row of the glyph */ pixel_addr += screen_width - cell_width; } } } /* Convert the whole 80x25 screen from text to pixels */ void convert_text_to_pixels() { int cx, cy, *buf, i, j; int glyph0, color0, bg0, fg0; int pixel_addr; pixel_addr = pixel_base; for (cy=0; cy>= 16; buf++; } /* Request font data for each character */ glyph0 = w & 255; while (read_io(BRIDGE_BUSY)); write_io(BRIDGE_READ_MEM_ADDR, font_base + glyph0*16); write_io(BRIDGE_READ_COUNT, 4); /* Compute colors */ /* If these are configurable, read them from scratch space */ color0 = (w >> 8) & 255; fg0 = (color0 & 1) ? 0xAA0000 : 0; fg0 |= (color0 & 2) ? 0x00AA00 : 0; fg0 |= (color0 & 4) ? 0x0000AA : 0; fg0 += (color0 & 8) ? 0x555555 : 0; bg0 = (color0 & 16) ? 0xAA0000 : 0; bg0 |= (color0 & 32) ? 0x00AA00 : 0; bg0 |= (color0 & 64) ? 0x0000AA : 0; if (blink_cycle && (color0 & 128)) fg0 = bg0; /* Wait for all four reads to appear in queue */ while (read_io(BRIDGE_READ_QUEUE_COUNT) < 4); /* Process glyph */ draw_glyph(pixel_addr, fg0, bg0); pixel_addr += cell_width; } pixel_addr += screen_width * (cell_height - 1); } poll_pci(); } void main() { /* All I do is translate over and over and over again */ while (1) { convert_text_to_pixels(); } } -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From mark.marshall60 at ntlworld.com Mon Jul 21 06:23:31 2008 From: mark.marshall60 at ntlworld.com (Mark Marshall) Date: Mon Jul 21 06:22:15 2008 Subject: [Open-graphics] Offer to help with writing a BIOS In-Reply-To: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> Message-ID: <488463A3.9010209@ntlworld.com> Hi. I've been lurking here for some time, and would like to offer to (help) write the VGA BIOS. As far as I can tell no actual work has started on this? I'm not fully sure of the level of compatibility that it is hoped to obtain for this first version, it looks like we are going for compatibility at the BIOS and memory access level but are not going to use any of the standard CGA/EGA/VGA registers? Is this correct? This would seem to limit things to text mode only, as there is no way you can use a graphics mode without also using the registers (apart from mode 0x13). My plan would probably be to try to setup bochs (or another x86 emulator) with a simple model of the chip and to program against that. It always seems to be a good idea to stay away from the real hardware for as long as possible when developing new code like this else you just get bogged down in details that will sort themselves out later. I realize that I've come to this project late so I don't want to rock the boat too much, but I think that we could get a lot further with VGA compatibility if we were to implement the VGA registers and the memory access logic in hardware. For the registers we would have the logic to implement the indexing schemes in hardware and their values would be readable by HQ (with possibly some sort of notification that they've changed, but this might not be needed). For host memory accesses I think that we should implement the VGA read and write modes. This actually turns out to be not that much verilog, but it does make the card appear to be much more like a real VGA card. We would need a way to switch this off for when we were using VESA modes or our own device drivers. (In my test implementation of the above they come out at about 250 lines of verilog each). As I said, I know that I am coming to this project late, and would still like to help even if the first targets are for a more reduced functionality than this. It would certainly be good to get a simple text console working, and this should obviously be a first target. The things that the BIOS needs to do are all relatively simple, but the documentation is spread thinly all over the place. So, a quick list: - Contain a magic header at the start (0xAA, 0x55) - Install a few interrupt handlers (0x10 + 0x1D,0x1F + 0x42 + 0x43) - Manage some data in the BIOS data area (addresses 0x40:00xx) - Load the correct HQ program (contained in the ROM) - Control the hardware Anyway, any comments or direction before I begin coding would be great. MM From urkedal at nbi.dk Mon Jul 21 13:41:26 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Mon Jul 21 13:45:54 2008 Subject: [Open-graphics] Connecting the HQ In-Reply-To: <9871ee5f0807200831h61225655v17cb8d0cb18d895b@mail.gmail.com> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> <20080720123839.GB5510@eideticdew.org> <9871ee5f0807200831h61225655v17cb8d0cb18d895b@mail.gmail.com> Message-ID: <20080721174126.GA11984@eideticdew.org> On 2008-07-20, Timothy Normand Miller wrote: > On Sun, Jul 20, 2008 at 8:38 AM, Petter Urkedal wrote: > > So if I understand this correctly, the current plan is full-intercept or > > no intercept, but it's something we may need to reconsider. I guess for > > VGA, full-intercept is okay since most data is translated, but if we use > > HQ in GPU mode, then full-intercept would be a major bottleneck. > > Making this selectable by HQ itself could be good, although we'll have > to be very careful about race conditions where there are PCI accesses > coming through at the same time that HQ makes the switch. > Alternatively, we could require the driver to do it. If we want to > switch between PIO and DMA, we have to require the driver to switch > the bypass on and off. Ideally, in GPU mode, DMA will be used for > almost everything. Any PIOs that do happen will have latency, since > HQ will have to poll for them and pass them along, but that will have > minimal impact. Of course, DMA is for later. Is there a need for HQ to switch the bypass? Else, I'd say leave it to the driver, as it probably needs to know about the bypass state anyway. > >> BTW, there are some facts about the bus protocol that we might want to > >> change. When accessing the bridge, the first cycle is the address, > >> and the flag bits indicate the target (memory or config registers). > > > > These flag bits sound like a natural extension as the highest bits of > > the address. > > Yeah, so an early change we can make is to move those bits into the > address, even before HQ is in. Various things in the XP10 and S3 will > have to change for that. Sounds good. I'll have a look at your VGA code in the meantime. > >> For reads, the subsequent cycle is the word count, after which the bus > >> switches direction and waits. > >> > >> For writes, subsequent cycles are data, flags indicate which bytes are > >> valid, and the address auto-increments. > > > > So, these flags can't be combined with the other data. I guess the > > common case is that all are 1, so shall we > > * write an optional byte-enable before write with default 1111, and > > then it applies to all data, or > > * add a write-mode where byte-enables and data are interlaced? > > Another option would be to have 15 I/O ports for writes, one for each > combination of flags. If you already know the flags (usually 1111), > you can hard-code it. Otherwise, you can add the flags to some > address. That's the solution, of course :-) > >> The address counter in the S3 auto-increments, but it only increments > >> the lower 7 bits of the word address. So every 128 32-bit words, it's > >> required that a new address be sent. That happens automatically with > >> PCI due to the way this target is designed, but HQ will have to > >> enforce it in the program. > > > > I think we can manage that. > > It could actually be challenging. A row of characters is 160 bytes, > or 40 words. Since that's not an even multiple of 128, the code that > requests reads will have to be designed to figure out where to split > the request, and in as few instructions as possible. Do you mean that these 40 words will be stored in chip memory, rather than being transferred from PCI directly to HQs BRAM? Of course that could be necessary to support some wide and long text modes. > Enough of the > way the bridge bus protocol works is mingled into the address decoder > that we may have to make some changes to be able to sensibly queue up > multiple separate read requests back to back so that HQ can always be > able to do something else while waiting on read data. I'll have to go > back and look to see what would happen if a command were queued up > while in read mode. Right now, that will never happen, since the > address decoder is the only thing ever talking to the bridge. > > We can also consider changes to the bridge protocol. To optimise the fetches, I'd consider something like * issue first read * fetch glyph of first character * issue second read * render first glyph (while second read is in progress) * fetch glyph of second character * issue third read * ... and after a certain number of glyphs, write the pixel data, then continue. An algorithm like that could benefit form having the bridge access HQ BRAM directly. Even though we can't avoid the small pipe to the bridge due to bypass mode, the transfer unit can be connected either to the pipe or parallel to it. That saves the program from fetching glyph data. The advantage of direct BRAM transfers is less obvious if we can come up with a modification to the bridge protocol, as you indicate, to allow outstanding reads, so that the algorithm can fetch and render simultaneously without unnecessary stalling. From theosib at gmail.com Mon Jul 21 14:03:06 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Mon Jul 21 14:07:30 2008 Subject: [Open-graphics] Offer to help with writing a BIOS In-Reply-To: <488463A3.9010209@ntlworld.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <488463A3.9010209@ntlworld.com> Message-ID: <9871ee5f0807211103s1c49faa0t803e81b330d963f1@mail.gmail.com> On Mon, Jul 21, 2008 at 6:23 AM, Mark Marshall wrote: > Hi. > > I've been lurking here for some time, and would like to offer to (help) > write the VGA BIOS. > > As far as I can tell no actual work has started on this? That is correct, and so your offer to help with it is most greatly appreciated. > > I'm not fully sure of the level of compatibility that it is hoped to obtain > for this first version, it looks like we are going for compatibility at the > BIOS and memory access level but are not going to use any of the standard > CGA/EGA/VGA registers? Is this correct? This would seem to limit things to > text mode only, as there is no way you can use a graphics mode without also > using the registers (apart from mode 0x13). We're limiting to text only as a first step. Our PCI controller can decode the VGA I/O register space. > > My plan would probably be to try to setup bochs (or another x86 emulator) > with a simple model of the chip and to program against that. It always > seems to be a good idea to stay away from the real hardware for as long as > possible when developing new code like this else you just get bogged down in > details that will sort themselves out later. Plus, it's easier to debug when something goes wrong. You won't have our hardware to distrust when you're developing, and once your BIOS is working, we won't have to worry about it doing the wrong thing. > I realize that I've come to this project late so I don't want to rock the > boat too much, but I think that we could get a lot further with VGA > compatibility if we were to implement the VGA registers and the memory > access logic in hardware. We would really like to be able to do a variety of VGA and VESA modes, so we have implemented this. Any register that we don't properly support already is a "bug" that needs to be fixed. > For the registers we would have the logic to > implement the indexing schemes in hardware and their values would be > readable by HQ (with possibly some sort of notification that they've > changed, but this might not be needed). The PCI controller and address decode logic drop PCI-related events into queues for HQ. Its job is to check those queues and service them in code. For instance, if someone writes to a VGA color palette register, HQ will be notified of the raw access, and it will have to figure out where in its scratch memory (and possibly our video controller) to store the color value. > For host memory accesses I think > that we should implement the VGA read and write modes. This actually turns > out to be not that much verilog, but it does make the card appear to be much > more like a real VGA card. We would need a way to switch this off for when > we were using VESA modes or our own device drivers. (In my test > implementation of the above they come out at about 250 lines of verilog > each). Our plan is to implement this in microcode. But we want to implement exactly this functionality. We're not going for fast here. Just compatible. > As I said, I know that I am coming to this project late, and would still > like to help even if the first targets are for a more reduced functionality > than this. It would certainly be good to get a simple text console working, > and this should obviously be a first target. I've been getting desperate. Late is better than never. Plus, we're coming up on a time when it'll become a bottleneck. Ideally, someone would be working on this, at least the preliminary stuff, well before the hardware is ready. We need to fill the wiki with all of the information that pertains to what you're doing, so we all have it as reference. > The things that the BIOS needs to do are all relatively simple, but the > documentation is spread thinly all over the place. So, a quick list: > - Contain a magic header at the start (0xAA, 0x55) > - Install a few interrupt handlers (0x10 + 0x1D,0x1F + 0x42 + 0x43) Do there exist some default routines for this in main system BIOSes? So, if the graphics card doesn't implement the interrupt, will the system BIOS assume standard hardware and use it? > - Manage some data in the BIOS data area (addresses 0x40:00xx) > - Load the correct HQ program (contained in the ROM) > - Control the hardware Yes. I think the only other thing that needs to be done is: - Set up video Our video controller is nothing like what's in a VGA card. For one thing, it only knows how to scan out pixels, which is why we need HQ to convert continuously in the background. > > Anyway, any comments or direction before I begin coding would be great. Mostly keep discussion on the list so we have documentation, we should try our best to document things on the wiki, and coordinate with me and Petter and Howard as we go along. I'm excited! Thanks for joining us on this! -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From theosib at gmail.com Mon Jul 21 14:19:35 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Mon Jul 21 14:23:58 2008 Subject: [Open-graphics] Connecting the HQ In-Reply-To: <20080721174126.GA11984@eideticdew.org> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> <20080720123839.GB5510@eideticdew.org> <9871ee5f0807200831h61225655v17cb8d0cb18d895b@mail.gmail.com> <20080721174126.GA11984@eideticdew.org> Message-ID: <9871ee5f0807211119q6d19f6eatfdb829e44067ae17@mail.gmail.com> On Mon, Jul 21, 2008 at 1:41 PM, Petter Urkedal wrote: > > Is there a need for HQ to switch the bypass? Else, I'd say leave it to > the driver, as it probably needs to know about the bypass state anyway. I can't think of a case where it wouldn't be dangerous, so we can leave out that capability for now. >> It could actually be challenging. A row of characters is 160 bytes, >> or 40 words. Since that's not an even multiple of 128, the code that >> requests reads will have to be designed to figure out where to split >> the request, and in as few instructions as possible. > > Do you mean that these 40 words will be stored in chip memory, rather > than being transferred from PCI directly to HQs BRAM? Of course that > could be necessary to support some wide and long text modes. The text mode is 4000 bytes or 1000 words. That won't fit in the scratch memory. >> Enough of the >> way the bridge bus protocol works is mingled into the address decoder >> that we may have to make some changes to be able to sensibly queue up >> multiple separate read requests back to back so that HQ can always be >> able to do something else while waiting on read data. I'll have to go >> back and look to see what would happen if a command were queued up >> while in read mode. Right now, that will never happen, since the >> address decoder is the only thing ever talking to the bridge. >> >> We can also consider changes to the bridge protocol. > > To optimise the fetches, I'd consider something like > > * issue first read > * fetch glyph of first character > * issue second read > * render first glyph (while second read is in progress) > * fetch glyph of second character > * issue third read > * ... You can't render (which involves writes) while a read is outstanding. Normally, this would not be the case, since inside the S3, we always have separate queues for writes, possibly a separate one for read requests, and one for read data. With the bridge as it is, we're sucking everything through one straw. About all we can do is make a read request, do some other computation, then wait for the read data. If we do it right, we may be able to queue up more than one read request before deciding to wait for data in the return queue, and as such, we can queue writes as well. But that command queue is only 16 entries, which amounts to 8 requests, and a write won't complete ahead of a read that came before it, so the only parallelism is in that the queue is serviced in parallel to what HQ does. Right now that may nor may not work, since with PCI, there's no opportunity for queueing things like this (except for writes and writes). Keep in mind that, at last as a first approximation, we don't have to convert the whole screen in one video frame. 10 FPS will be more than adequate. > and after a certain number of glyphs, write the pixel data, then > continue. > > An algorithm like that could benefit form having the bridge access HQ > BRAM directly. Even though we can't avoid the small pipe to the bridge > due to bypass mode, the transfer unit can be connected either to the > pipe or parallel to it. That saves the program from fetching glyph > data. Let's keep this in mind. But if we manage to meet or exceed 30 FPS, then it becomes an unnecessary optimization. On the other hand, it may become necessary for DMA to be efficient! > The advantage of direct BRAM transfers is less obvious if we can come up > with a modification to the bridge protocol, as you indicate, to allow > outstanding reads, so that the algorithm can fetch and render > simultaneously without unnecessary stalling. The first, easiest thing we can do is make sure that the bridge logic doesn't try to dequeue a command unless the bus can take it. This way, we can queue up multiple read requests and writes. If a read request comes along, then the bridge holds up all other requests until the whole read is serviced, but that doesn't prevent us from queueing up some writes. We can also think about enlarging that command queue. This will allow us to dump quite a lot of write data into it and let that go at its own pace while HQ computes on something else. We need to be strategic about the queue management. Do we want to clear the read return data sooner or later? As it is, if we request more reads than can be held in the queue, and we don't clear the queue fast enough, then read data will be lost, which could be disasterous, since everything expecting a certain number of read words to appear. The way the code I've written works now, when it fetches a glyph, that's only four words, and so it can leave them in the queue and pull them out when it's time. One drawback of this is that since there are out-standing words in the queue, we can't poll PCI as often as we might like. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Mon Jul 21 15:24:36 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Mon Jul 21 15:28:58 2008 Subject: [Open-graphics] Connecting the HQ In-Reply-To: <9871ee5f0807211119q6d19f6eatfdb829e44067ae17@mail.gmail.com> References: <9871ee5f0807160932l955bdc2h45042c3d01d00f2b@mail.gmail.com> <20080718173132.GA6004@eideticdew.org> <9871ee5f0807181123m6976ea9dr7a374eae57f16ef3@mail.gmail.com> <20080719122021.GA1459@eideticdew.org> <9871ee5f0807191355j4f962093s2d1f1b220376abcb@mail.gmail.com> <20080720123839.GB5510@eideticdew.org> <9871ee5f0807200831h61225655v17cb8d0cb18d895b@mail.gmail.com> <20080721174126.GA11984@eideticdew.org> <9871ee5f0807211119q6d19f6eatfdb829e44067ae17@mail.gmail.com> Message-ID: <20080721192436.GA12420@eideticdew.org> On 2008-07-21, Timothy Normand Miller wrote: > The text mode is 4000 bytes or 1000 words. That won't fit in the > scratch memory. Oh dear, I completely forgot the problem at hand. > On Mon, Jul 21, 2008 at 1:41 PM, Petter Urkedal wrote: > > To optimise the fetches, I'd consider something like > > > > * issue first read > > * fetch glyph of first character > > * issue second read > > * render first glyph (while second read is in progress) > > * fetch glyph of second character > > * issue third read > > * ... > > You can't render (which involves writes) while a read is outstanding. Unless I render to BRAM and then issue a command to transfer it to memory. Okay, so this will in fact only make sense with direct BRAM transfer. From wpm at openhardwarefoundation.org Tue Jul 22 22:31:29 2008 From: wpm at openhardwarefoundation.org (Patrick McNamara) Date: Tue Jul 22 22:36:00 2008 Subject: [Open-graphics] OHF now accepting donations. Message-ID: <48869801.3050106@openhardwarefoundation.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I wanted to make this a big fancy announcement, but writing such an announcement is not a particularly enjoyable nor easy task. Nor, in this case, is it as important as the message to deliver. At long last, the OHF is accepting donations. For the new future, PayPal is our method of accepting online donations. There is a bright yellow donate button on the OHF homepage. Use it frequently. :) I won't go into the details about why it took so long, other than to say that we are all volunteering our spare time and real life takes priority. The board has met to discuss this problem and we are taking steps to make sure we are more active in the future. Patrick M -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIhpgBrsWaHsqL/ewRAgVdAKCE37/oqQZR4yIbzhvUOYM1HZ+giwCff5YK cT2slHK41ULdxoAOdXg8N1k= =fv53 -----END PGP SIGNATURE----- From wpm at openhardwarefoundation.org Tue Jul 22 22:52:29 2008 From: wpm at openhardwarefoundation.org (Patrick McNamara) Date: Tue Jul 22 22:57:03 2008 Subject: [Open-graphics] OHF now accepting donations. In-Reply-To: <48869801.3050106@openhardwarefoundation.org> References: <48869801.3050106@openhardwarefoundation.org> Message-ID: <48869CED.9030902@openhardwarefoundation.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Please note the sentence should read "For the near future, Paypal..." *sigh* This is why I don't have job in copy writing... ;) Patrick McNamara wrote: | I wanted to make this a big fancy announcement, but writing such an | announcement is not a particularly enjoyable nor easy task. Nor, in | this case, is it as important as the message to deliver. | | At long last, the OHF is accepting donations. For the new future, | PayPal is our method of accepting online donations. There is a bright | yellow donate button on the OHF homepage. Use it frequently. :) | | I won't go into the details about why it took so long, other than to say | that we are all volunteering our spare time and real life takes | priority. The board has met to discuss this problem and we are taking | steps to make sure we are more active in the future. | | Patrick M _______________________________________________ Open-graphics mailing list Open-graphics@duskglow.com http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIhpztrsWaHsqL/ewRAppYAKCWU55KFnN3QcuCTkVy0lm36mAhHwCeJyJT 3GHJ6JdN980ug9WGwK7bFfs= =cwb/ -----END PGP SIGNATURE----- From urkedal at nbi.dk Wed Jul 23 02:54:21 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Wed Jul 23 02:58:54 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> Message-ID: <20080723065419.GA7424@eideticdew.org> Starting with your C code, it now looks quite manageable. Using bottom up coding, poll_pci() appears to be the first thing to implement. So, we need ports. I dug up a previous discussion http://readlist.com/lists/duskglow.com/open-graphics/1/6252.html. After trimming the long identifiers and incorporating your suggestion of encoding byte enables in the lower part of the port address, I ended up with the attached file. BTW, do we use MIT/X11 license for all HQ code? -------------- next part -------------- ;; PCI Ports ;; ========= let _PCI_B = -0x20 ; base address for PCI ports ;; Master Command let PCI_M_CMD = _PCI_B + 0x00 ; out - direction and count let PCI_M_CMD_FREE = _PCI_B + 0x01 ; in - number of commands we can write ;; Master Write let PCI_M_WRITE_FREE = _PCI_B + 0x02 ; in - how many data words we can write let PCI_M_WRITE_DATA_0000=_PCI_B + 0x10 ; out - 16 ports for data word, where ; lower 4 address bits are byte enables let PCI_M_WRITE_DATA_1111 = PCI_M_WRITE_DATA_0000 + 15 ; for convenience ;; Master Read let PCI_M_READ_DATA = _PCI_B + 0x04 ; in - data word let PCI_M_READ_COUNT = _PCI_B + 0x05 ; in - number of words available to read ;; Target Write let PCI_T_WRITE_COUNT = _PCI_B + 0x08 ; in - number of words in write queue let PCI_T_WRITE_ADDR_FL = _PCI_B + 0x09 ; in - address and flags for a write let PCI_T_WRITE_DATA = _PCI_B + 0x0a ; in - data of write word ;; Target Read let PCI_T_READ_PENDING = _PCI_B + 0x0b ; in - nonzero if pending read let PCI_T_READ_ADDR = _PCI_B + 0x0c ; in - the one address that is pending let PCI_T_READ_DATA = _PCI_B + 0x0d ; out - where we write the one word ;; Memory Ports ;; ============ let _MEM_B = -0x40 ; base address for memory ports let MEM_READREQ_FREE = _MEM_B + 0x00 ; in - Free slots in command pipe. let MEM_READREQ_ADDR_TRIG=_MEM_B + 0x01 ; out - First address to read. let MEM_READREQ_COUNT = _MEM_B + 0x02 ; out - Number of words to read. let MEM_READREPLY_DATA = _MEM_B + 0x03 ; in - Data stream from memory. let MEM_READREPLY_AVAIL = _MEM_B + 0x04 ; in - Number of words in FIFO. let MEM_WRITE_ADDR = _MEM_B + 0x05 ; out - Start address. let MEM_WRITE_FREE = _MEM_B + 0x06 ; in - Free slots in output FIFO. let MEM_WRITE_DATA_0000 = _MEM_B + 0x10 ; out - 16 ports for data stream, where ; lower 4 address bits are enables. let MEM_WRITE_DATA_1111 = MEM_WRITE_DATA_0000 + 15 ; for convenience From theosib at gmail.com Wed Jul 23 08:59:44 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Wed Jul 23 09:04:16 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <20080723065419.GA7424@eideticdew.org> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> Message-ID: <9871ee5f0807230559i56a0dd8bs9533db46c8fe853b@mail.gmail.com> On Wed, Jul 23, 2008 at 2:54 AM, Petter Urkedal wrote: > Starting with your C code, it now looks quite manageable. Using bottom > up coding, poll_pci() appears to be the first thing to implement. So, > we need ports. > > I dug up a previous discussion > http://readlist.com/lists/duskglow.com/open-graphics/1/6252.html. > After trimming the long identifiers and incorporating your suggestion of > encoding byte enables in the lower part of the port address, I ended up > with the attached file. > > BTW, do we use MIT/X11 license for all HQ code? For hardware, we want GPL. For software that might get reused and reincorporated, we prefer MIT. But this one's a little weird. Something GPL like, but with the ability to embed in something non-GPL. LGPL? But we need to be careful that this causes no problems for the BSD folks. > _______________________________________________ > Open-graphics mailing list > Open-graphics@duskglow.com > http://lists.duskglow.com/mailman/listinfo/open-graphics > List service provided by Duskglow Consulting, LLC (www.duskglow.com) > -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From theosib at gmail.com Wed Jul 23 10:48:23 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Wed Jul 23 10:52:55 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <20080723065419.GA7424@eideticdew.org> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> Message-ID: <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> Excellent. I just have a couple of comments. ;; PCI Ports ;; ========= let _PCI_B = -0x20 ; base address for PCI ports ;; Master Command let PCI_M_CMD = _PCI_B + 0x00 ; out - direction and count let PCI_M_CMD_FREE = _PCI_B + 0x01 ; in - number of commands we can write ;; Master Write let PCI_M_WRITE_FREE = _PCI_B + 0x02 ; in - how many data words we can write let PCI_M_WRITE_DATA_0000=_PCI_B + 0x10 ; out - 16 ports for data word, where ; lower 4 address bits are byte enables let PCI_M_WRITE_DATA_1111 = PCI_M_WRITE_DATA_0000 + 15 ; for convenience * I was thinking about it, and for PCI master, you're right. We definitely need a zero so we can do a null transaction on the bus. ;; Master Read let PCI_M_READ_DATA = _PCI_B + 0x04 ; in - data word let PCI_M_READ_COUNT = _PCI_B + 0x05 ; in - number of words available to read ;; Target Write let PCI_T_WRITE_COUNT = _PCI_B + 0x08 ; in - number of words in write queue let PCI_T_WRITE_ADDR_FL = _PCI_B + 0x09 ; in - address and flags for a write let PCI_T_WRITE_DATA = _PCI_B + 0x0a ; in - data of write word * The way we queue writes for the bridge is that there's an address followed by data. But since HQ is 32-bit, there aren't enough bits to detect that. But maybe we should think about that. It would be nice if the protocol to fetch PCI writes were more similar to the protocol to push them on to the bridge. ;; Target Read let PCI_T_READ_PENDING = _PCI_B + 0x0b ; in - nonzero if pending read let PCI_T_READ_ADDR = _PCI_B + 0x0c ; in - the one address that is pending let PCI_T_READ_DATA = _PCI_B + 0x0d ; out - where we write the one word ;; Memory Ports ;; ============ let _MEM_B = -0x40 ; base address for memory ports let MEM_READREQ_FREE = _MEM_B + 0x00 ; in - Free slots in command pipe. let MEM_READREQ_ADDR_TRIG=_MEM_B + 0x01 ; out - First address to read. let MEM_READREQ_COUNT = _MEM_B + 0x02 ; out - Number of words to read. let MEM_READREPLY_DATA = _MEM_B + 0x03 ; in - Data stream from memory. let MEM_READREPLY_AVAIL = _MEM_B + 0x04 ; in - Number of words in FIFO. let MEM_WRITE_ADDR = _MEM_B + 0x05 ; out - Start address. let MEM_WRITE_FREE = _MEM_B + 0x06 ; in - Free slots in output FIFO. let MEM_WRITE_DATA_0000 = _MEM_B + 0x10 ; out - 16 ports for data stream, where ; lower 4 address bits are enables. let MEM_WRITE_DATA_1111 = MEM_WRITE_DATA_0000 + 15 ; for convenience * All write addresses, read addresses, read counts, and write data go into one queue. Obviously, they need different ports in order to set the other flags in the command word, but since there's only one queue, then there should be only one "free" port. I've also been thinking about the 0000 write. At first, I thought it should be there but do nothing. Now, I think it's a good way to insert a "dead" write into the stream, like a convenient way to skip a word or two while advancing the auto-increment, although issuing an address command doesn't really waste any time by comparison. * There are three bus commands. They are: parameter b_addr = 1; parameter b_rcount = 2; parameter b_write = 3; So, you can set addresses all you want. They take up queue space, but all they do is set the address in the S3. rcount triggers a read, and write issues a write word (and we can think of it as generally stateless except that we have to make sure the address is right). If you know what rcount and write commands are going to do to the address counter in the S3, there may be circumstances where you can skip the address command, although I wouldn't generally recommend certain combinations simply because it's more of a pain sometimes to keep track than to just issue another address. From urkedal at nbi.dk Wed Jul 23 13:05:59 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Wed Jul 23 13:10:35 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <9871ee5f0807230559i56a0dd8bs9533db46c8fe853b@mail.gmail.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230559i56a0dd8bs9533db46c8fe853b@mail.gmail.com> Message-ID: <20080723170559.GA8580@eideticdew.org> On 2008-07-23, Timothy Normand Miller wrote: > On Wed, Jul 23, 2008 at 2:54 AM, Petter Urkedal wrote: > > Starting with your C code, it now looks quite manageable. Using bottom > > up coding, poll_pci() appears to be the first thing to implement. So, > > we need ports. > > > > I dug up a previous discussion > > http://readlist.com/lists/duskglow.com/open-graphics/1/6252.html. > > After trimming the long identifiers and incorporating your suggestion of > > encoding byte enables in the lower part of the port address, I ended up > > with the attached file. > > > > BTW, do we use MIT/X11 license for all HQ code? > > For hardware, we want GPL. For software that might get reused and > reincorporated, we prefer MIT. But this one's a little weird. > Something GPL like, but with the ability to embed in something > non-GPL. LGPL? But we need to be careful that this causes no > problems for the BSD folks. Is there a reason we shouldn't use MIT for HQ code? My though is that this code is really only useful for this particular microcontroller, so maybe the GPL on the controller itself is sufficient protection? If someone rewrites the code for another controller, it's no longer covered by our copyright, anyway. From theosib at gmail.com Wed Jul 23 13:40:13 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Wed Jul 23 13:44:45 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <20080723170559.GA8580@eideticdew.org> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230559i56a0dd8bs9533db46c8fe853b@mail.gmail.com> <20080723170559.GA8580@eideticdew.org> Message-ID: <9871ee5f0807231040u35274dd8ia7dc8fb471ed33ab@mail.gmail.com> On Wed, Jul 23, 2008 at 1:05 PM, Petter Urkedal wrote: > Is there a reason we shouldn't use MIT for HQ code? My though is that > this code is really only useful for this particular microcontroller, so > maybe the GPL on the controller itself is sufficient protection? If > someone rewrites the code for another controller, it's no longer covered > by our copyright, anyway. You're right. It's less complicated over-all if we just put this code under MIT. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Wed Jul 23 13:49:33 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Wed Jul 23 13:54:05 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> Message-ID: <20080723174933.GB8580@eideticdew.org> On 2008-07-23, Timothy Normand Miller wrote: > ;; Target Write > let PCI_T_WRITE_COUNT = _PCI_B + 0x08 ; in - number of words in write queue > let PCI_T_WRITE_ADDR_FL = _PCI_B + 0x09 ; in - address and flags for a write > let PCI_T_WRITE_DATA = _PCI_B + 0x0a ; in - data of write word > > * The way we queue writes for the bridge is that there's an address > followed by data. But since HQ is 32-bit, there aren't enough bits to > detect that. I don't quite follow, but I think this is because I don't know PCI_T_WRITE_ADDR_FL is encoded. Is it a byte or word address, and where are the byte-enables encoded? Are there other flags that will not be naturally integrated in the address? > But maybe we should think about that. It would be nice > if the protocol to fetch PCI writes were more similar to the protocol > to push them on to the bridge. I tentatively agree, but maybe they are naturally different as PCI uses the same byte-enables for a whole stream, whereas our bridge allows per-data enables. Assuming e.g. byte enable are encoded in the top of the address, the code may look something like (omitting count logic) move [PCI_T_WRITE_ADDR_FL], r0 shiftu r0, -28, r1 ; r1 = byte-enables and 0xfffffff, r0, r0 ; r0 = start address move r0, [MEM_WRITE_ADDR] ... ; begin test and loop move [PCI_T_WRITE_DATA], r2 move r2, [add MEM_WRITE_DATA_0000, r1] ... ; end test and loop So it need not be too difficult, if I understand this right. > ;; Memory Ports > ;; ============ > > let _MEM_B = -0x40 ; base address for memory ports > > let MEM_READREQ_FREE = _MEM_B + 0x00 ; in - Free slots in command pipe. > let MEM_READREQ_ADDR_TRIG=_MEM_B + 0x01 ; out - First address to read. > let MEM_READREQ_COUNT = _MEM_B + 0x02 ; out - Number of words to read. > let MEM_READREPLY_DATA = _MEM_B + 0x03 ; in - Data stream from memory. > let MEM_READREPLY_AVAIL = _MEM_B + 0x04 ; in - Number of words in FIFO. > > let MEM_WRITE_ADDR = _MEM_B + 0x05 ; out - Start address. > let MEM_WRITE_FREE = _MEM_B + 0x06 ; in - Free slots in output FIFO. > let MEM_WRITE_DATA_0000 = _MEM_B + 0x10 ; out - 16 ports for data stream, where > ; lower 4 address bits are enables. > let MEM_WRITE_DATA_1111 = MEM_WRITE_DATA_0000 + 15 ; for convenience > > > * All write addresses, read addresses, read counts, and write data go > into one queue. Obviously, they need different ports in order to set > the other flags in the command word, but since there's only one queue, > then there should be only one "free" port. I've also been thinking > about the 0000 write. At first, I thought it should be there but do > nothing. Now, I think it's a good way to insert a "dead" write into > the stream, like a convenient way to skip a word or two while > advancing the auto-increment, although issuing an address command > doesn't really waste any time by comparison. Gate-wise, the 0000-writes are for free, aren't they? It usually takes logic to deal with special cases, so can't we just omit that logic? I think 0000-writes could be useful in cases where we don't know the byte-enables while writing the code, if such cases exists. > * There are three bus commands. They are: > parameter b_addr = 1; > parameter b_rcount = 2; > parameter b_write = 3; > > So, you can set addresses all you want. They take up queue space, but > all they do is set the address in the S3. rcount triggers a read, and > write issues a write word (and we can think of it as generally > stateless except that we have to make sure the address is right). If > you know what rcount and write commands are going to do to the address > counter in the S3, there may be circumstances where you can skip the > address command, although I wouldn't generally recommend certain > combinations simply because it's more of a pain sometimes to keep > track than to just issue another address. So, to mirror the bridge commands, it would be sufficient with MEM_CMD_FREE MEM_CMD_ADDR MEM_CMD_READ_COUNT MEM_READ_AVAIL MEM_READ_DATA MEM_WRITE_DATA_0000 MEM_WRITE_DATA_1111 where MEM_WRITE_DATA_xxxx are really commands. Yet another naming scheme: MEM_CMDQ_FREE MEM_SEND_ADDR MEM_SEND_READ_COUNT MEM_SEND_DATA_0000 MEM_SEND_DATA_1111 MEM_READQ_AVAIL MEM_READQ_DATA From theosib at gmail.com Wed Jul 23 14:18:02 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Wed Jul 23 14:22:35 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <20080723174933.GB8580@eideticdew.org> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> <20080723174933.GB8580@eideticdew.org> Message-ID: <9871ee5f0807231118k4f396a69m3cc6ec62830798b4@mail.gmail.com> On Wed, Jul 23, 2008 at 1:49 PM, Petter Urkedal wrote: > On 2008-07-23, Timothy Normand Miller wrote: >> ;; Target Write >> let PCI_T_WRITE_COUNT = _PCI_B + 0x08 ; in - number of words in write queue >> let PCI_T_WRITE_ADDR_FL = _PCI_B + 0x09 ; in - address and flags for a write >> let PCI_T_WRITE_DATA = _PCI_B + 0x0a ; in - data of write word >> >> * The way we queue writes for the bridge is that there's an address >> followed by data. But since HQ is 32-bit, there aren't enough bits to >> detect that. > > I don't quite follow, but I think this is because I don't know > PCI_T_WRITE_ADDR_FL is encoded. Is it a byte or word address, and where > are the byte-enables encoded? Are there other flags that will not be > naturally integrated in the address? When the address decode logic gets a write from PCI, first it gets the address and pushes that into the queue, with some flags indicating that this is an address, and along side are flags that indicate the target (engine, memory). The address is a byte address; only for I/O space will the lower two bits be non-zero. The next thing it does is push the data word onto the same queue, with some flags indicating that this is write data, along with the byte enables. Since 38 bits are required to make sense of the queue entry (for data: 32-bit data, 4-bit byte enable, 2-bit type), and HQ is 32-bit, either we need to expand HQ to more than 32-bit, or we throw more ports at it, with one port triggering a dequeue when accesses (probably the data/address portion). > >> But maybe we should think about that. It would be nice >> if the protocol to fetch PCI writes were more similar to the protocol >> to push them on to the bridge. > > I tentatively agree, but maybe they are naturally different as PCI uses > the same byte-enables for a whole stream, whereas our bridge allows > per-data enables. PCI allows per-word byte enables. The c/be signals are the command on an address cycle but the byte enables on a data cycle. > Assuming e.g. byte enable are encoded in the top of the address, the > code may look something like (omitting count logic) For the command queue, an address entry contains the following data: Address -- 30 bits Type -- 2 bits (indicating that it's an address) Target -- which memory space The following memory spaces are valid parameter TARGET_CFG=0; parameter TARGET_ENG=1; parameter TARGET_MEM=2; parameter TARGET_PROM=3; parameter TARGET_IO=4; HQ can't get CFG or PROM, but it can get any of the other three. (Actually, I think IO isn't connected right, but we can fix that easily.) > move [PCI_T_WRITE_ADDR_FL], r0 > shiftu r0, -28, r1 ; r1 = byte-enables > and 0xfffffff, r0, r0 ; r0 = start address > move r0, [MEM_WRITE_ADDR] > ... ; begin test and loop > move [PCI_T_WRITE_DATA], r2 > move r2, [add MEM_WRITE_DATA_0000, r1] > ... ; end test and loop > > So it need not be too difficult, if I understand this right. > >> ;; Memory Ports >> ;; ============ >> >> let _MEM_B = -0x40 ; base address for memory ports >> >> let MEM_READREQ_FREE = _MEM_B + 0x00 ; in - Free slots in command pipe. >> let MEM_READREQ_ADDR_TRIG=_MEM_B + 0x01 ; out - First address to read. >> let MEM_READREQ_COUNT = _MEM_B + 0x02 ; out - Number of words to read. >> let MEM_READREPLY_DATA = _MEM_B + 0x03 ; in - Data stream from memory. >> let MEM_READREPLY_AVAIL = _MEM_B + 0x04 ; in - Number of words in FIFO. >> >> let MEM_WRITE_ADDR = _MEM_B + 0x05 ; out - Start address. >> let MEM_WRITE_FREE = _MEM_B + 0x06 ; in - Free slots in output FIFO. >> let MEM_WRITE_DATA_0000 = _MEM_B + 0x10 ; out - 16 ports for data stream, where >> ; lower 4 address bits are enables. >> let MEM_WRITE_DATA_1111 = MEM_WRITE_DATA_0000 + 15 ; for convenience >> >> >> * All write addresses, read addresses, read counts, and write data go >> into one queue. Obviously, they need different ports in order to set >> the other flags in the command word, but since there's only one queue, >> then there should be only one "free" port. I've also been thinking >> about the 0000 write. At first, I thought it should be there but do >> nothing. Now, I think it's a good way to insert a "dead" write into >> the stream, like a convenient way to skip a word or two while >> advancing the auto-increment, although issuing an address command >> doesn't really waste any time by comparison. > > Gate-wise, the 0000-writes are for free, aren't they? It usually takes > logic to deal with special cases, so can't we just omit that logic? I > think 0000-writes could be useful in cases where we don't know the > byte-enables while writing the code, if such cases exists. Well, the port address should exist in any case. But it should be passed through to the bridge. The S3 may throw it away. In other words, don't pay it any special attention. >> * There are three bus commands. They are: >> parameter b_addr = 1; >> parameter b_rcount = 2; >> parameter b_write = 3; >> >> So, you can set addresses all you want. They take up queue space, but >> all they do is set the address in the S3. rcount triggers a read, and >> write issues a write word (and we can think of it as generally >> stateless except that we have to make sure the address is right). If >> you know what rcount and write commands are going to do to the address >> counter in the S3, there may be circumstances where you can skip the >> address command, although I wouldn't generally recommend certain >> combinations simply because it's more of a pain sometimes to keep >> track than to just issue another address. > > So, to mirror the bridge commands, it would be sufficient with > > MEM_CMD_FREE > MEM_CMD_ADDR > MEM_CMD_READ_COUNT > MEM_READ_AVAIL > MEM_READ_DATA > MEM_WRITE_DATA_0000 > MEM_WRITE_DATA_1111 > > where MEM_WRITE_DATA_xxxx are really commands. Yet another naming > scheme: > > MEM_CMDQ_FREE > MEM_SEND_ADDR > MEM_SEND_READ_COUNT > MEM_SEND_DATA_0000 > MEM_SEND_DATA_1111 > MEM_READQ_AVAIL > MEM_READQ_DATA I kinda like the second one better. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Wed Jul 23 14:42:39 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Wed Jul 23 14:47:10 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <9871ee5f0807231118k4f396a69m3cc6ec62830798b4@mail.gmail.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> <20080723174933.GB8580@eideticdew.org> <9871ee5f0807231118k4f396a69m3cc6ec62830798b4@mail.gmail.com> Message-ID: <20080723184239.GA8842@eideticdew.org> On 2008-07-23, Timothy Normand Miller wrote: > When the address decode logic gets a write from PCI, first it gets the > address and pushes that into the queue, with some flags indicating > that this is an address, and along side are flags that indicate the > target (engine, memory). The address is a byte address; only for I/O > space will the lower two bits be non-zero. The next thing it does is > push the data word onto the same queue, with some flags indicating > that this is write data, along with the byte enables. > > Since 38 bits are required to make sense of the queue entry (for data: > 32-bit data, 4-bit byte enable, 2-bit type), and HQ is 32-bit, either > we need to expand HQ to more than 32-bit, or we throw more ports at > it, with one port triggering a dequeue when accesses (probably the > data/address portion). Thanks for the explanation. Then, I suggest we use one data port and one byte-enables port. > PCI allows per-word byte enables. The c/be signals are the command on > an address cycle but the byte enables on a data cycle. So, I recall wrongly. Then we'll have to read two PCI ports per write, but that's at most 3/2 more HQ cycles. > > Assuming e.g. byte enable are encoded in the top of the address, the > > code may look something like (omitting count logic) > > For the command queue, an address entry contains the following data: > > Address -- 30 bits > Type -- 2 bits (indicating that it's an address) > Target -- which memory space > > The following memory spaces are valid > > parameter TARGET_CFG=0; > parameter TARGET_ENG=1; > parameter TARGET_MEM=2; > parameter TARGET_PROM=3; > parameter TARGET_IO=4; > > HQ can't get CFG or PROM, but it can get any of the other three. > (Actually, I think IO isn't connected right, but we can fix that > easily.) When writing these out on the bridge, I assume the target must be encoded in the address. This is what you mentioned we could do before HQ, right? I can see there is one target (maybe PROM) which must be filtered out before we have 2 bits that can fit in the address. Either way works. > > where MEM_WRITE_DATA_xxxx are really commands. Yet another naming > > scheme: > > > > MEM_CMDQ_FREE > > MEM_SEND_ADDR > > MEM_SEND_READ_COUNT > > MEM_SEND_DATA_0000 > > MEM_SEND_DATA_1111 > > MEM_READQ_AVAIL > > MEM_READQ_DATA > > I kinda like the second one better. Yeah, it was growing on me too after sending the mail, maybe because it clearly distinguishes the active command ports (by the verb "SEND") from the passive ones (by the nouns "CMDQ" and "READQ"). From theosib at gmail.com Wed Jul 23 15:45:46 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Wed Jul 23 15:50:19 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <20080723184239.GA8842@eideticdew.org> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> <20080723174933.GB8580@eideticdew.org> <9871ee5f0807231118k4f396a69m3cc6ec62830798b4@mail.gmail.com> <20080723184239.GA8842@eideticdew.org> Message-ID: <9871ee5f0807231245h51688845re4e66102d21737b5@mail.gmail.com> On Wed, Jul 23, 2008 at 2:42 PM, Petter Urkedal wrote: > On 2008-07-23, Timothy Normand Miller wrote: >> When the address decode logic gets a write from PCI, first it gets the >> address and pushes that into the queue, with some flags indicating >> that this is an address, and along side are flags that indicate the >> target (engine, memory). The address is a byte address; only for I/O >> space will the lower two bits be non-zero. The next thing it does is >> push the data word onto the same queue, with some flags indicating >> that this is write data, along with the byte enables. >> >> Since 38 bits are required to make sense of the queue entry (for data: >> 32-bit data, 4-bit byte enable, 2-bit type), and HQ is 32-bit, either >> we need to expand HQ to more than 32-bit, or we throw more ports at >> it, with one port triggering a dequeue when accesses (probably the >> data/address portion). > > Thanks for the explanation. Then, I suggest we use one data port and > one byte-enables port. I was thinking that we should have one port that indicates type (address, read count, write data) and byte enables or target. The other port would be the data/address port and also dequeue. Something I just realized: We need an enable/disable for the data cache in the address decoder. For memory accesses, it has a 16-word cache that we need to bypass when HQ is running. Sometimes we want to leave it on. For instance, in text mode, it doesn't hurt anything. >> > Assuming e.g. byte enable are encoded in the top of the address, the >> > code may look something like (omitting count logic) >> >> For the command queue, an address entry contains the following data: >> >> Address -- 30 bits >> Type -- 2 bits (indicating that it's an address) >> Target -- which memory space >> >> The following memory spaces are valid >> >> parameter TARGET_CFG=0; >> parameter TARGET_ENG=1; >> parameter TARGET_MEM=2; >> parameter TARGET_PROM=3; >> parameter TARGET_IO=4; >> >> HQ can't get CFG or PROM, but it can get any of the other three. >> (Actually, I think IO isn't connected right, but we can fix that >> easily.) > > When writing these out on the bridge, I assume the target must be > encoded in the address. This is what you mentioned we could do before > HQ, right? I can see there is one target (maybe PROM) which must be > filtered out before we have 2 bits that can fit in the address. Either > way works. Another way to do it is to encode the target into the port, like we did with byte enables. There are only two (maybe three, more on this later) targets that can be accessed via the bridge, memory and engine. So we just need two address ports: MEM_SEND_ADDR_MEM MEM_SEND_ADDR_ENG Everything else stays the same. >> > where MEM_WRITE_DATA_xxxx are really commands. Yet another naming >> > scheme: >> > >> > MEM_CMDQ_FREE >> > MEM_SEND_ADDR >> > MEM_SEND_READ_COUNT >> > MEM_SEND_DATA_0000 >> > MEM_SEND_DATA_1111 >> > MEM_READQ_AVAIL >> > MEM_READQ_DATA >> >> I kinda like the second one better. > > Yeah, it was growing on me too after sending the mail, maybe because it > clearly distinguishes the active command ports (by the verb "SEND") from > the passive ones (by the nouns "CMDQ" and "READQ"). Yes. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Thu Jul 24 14:15:46 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Thu Jul 24 14:16:06 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <9871ee5f0807231245h51688845re4e66102d21737b5@mail.gmail.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> <20080723174933.GB8580@eideticdew.org> <9871ee5f0807231118k4f396a69m3cc6ec62830798b4@mail.gmail.com> <20080723184239.GA8842@eideticdew.org> <9871ee5f0807231245h51688845re4e66102d21737b5@mail.gmail.com> Message-ID: <20080724181545.GA11533@eideticdew.org> On 2008-07-23, Timothy Normand Miller wrote: > I was thinking that we should have one port that indicates type > (address, read count, write data) and byte enables or target. The > other port would be the data/address port and also dequeue. Done. > Something I just realized: We need an enable/disable for the data > cache in the address decoder. For memory accesses, it has a 16-word > cache that we need to bypass when HQ is running. Sometimes we want to > leave it on. For instance, in text mode, it doesn't hurt anything. As far as I can see from pci_address_decode.v, this is used for target reads, right? So, as long as reads are pure, it doesn't hurt to leave it on. Of course, HQ can always yield some side-effect on read, but is that useful? If not, we could maybe reformulate the target read ports of HQ to allow transmitting more than a single word, and thus benefit from the cache. > Another way to do it is to encode the target into the port, like we > did with byte enables. There are only two (maybe three, more on this > later) targets that can be accessed via the bridge, memory and engine. > So we just need two address ports: > MEM_SEND_ADDR_MEM > MEM_SEND_ADDR_ENG Done, and we just leave space for a third port. > Everything else stays the same. Well, I wasn't quite satisfied, but we can go back to the previous version, if you like. I assumed the master command queue is the same as the master write queue, so that the _FREE ports should be the same. Thus, I did something similar to the memory port, including the names. Now, all enqueuing ports are named _SEND and all dequeuing ports are named _RECEIVE. -------------- next part -------------- ;; PCI Target Numbers let PCI_TARGET_CFG = 0 ; (not seed by HQ) configuration let PCI_TARGET_ENG = 1 ; engine let PCI_TARGET_MEM = 2 ; memory let PCI_TARGET_PROM = 3 ; (not seen by HQ) let PCI_TARGET_IO = 4 ; IO ;; PCI Ports ;; ========= let _PCI_B = -0x20 ; base address for PCI ports ;; Master Command let PCI_M_CMDQ_FREE = _PCI_B + 0x00 ; in - number of commands we can write let PCI_M_SEND_CMD = _PCI_B + 0x01 ; enq - direction and count ;; Master Write let PCI_M_SEND_DATA_0000= _PCI_B + 0x10 ; enq - 16 ports for data word, where ; lower 4 address bits are byte enables let PCI_M_SEND_DATA_1111= PCI_M_SEND_DATA_0000 + 15 ; for convenience ;; Master Read let PCI_M_READQ_COUNT = _PCI_B + 0x04 ; in - number of words available to read let PCI_M_RECEIVE_DATA = _PCI_B + 0x05 ; deq - data word ;; Target Write let PCI_TW_COUNT = _PCI_B + 0x08 ; in - number of words in write queue let PCI_TW_INFO = _PCI_B + 0x0a ; in - type, target, byte-enables let PCI_TW_INFO_TARGET_SHIFT = 29 let PCI_TW_INFO_TARGET_MASK = 0xe0000000 let PCI_TW_INFO_TYPE_MASK = 0x03000000 let PCI_TW_INFO_TYPE_ADDR = 0x00000000 let PCI_TW_INFO_TYPE_RCOUNT = 0x01000000 let PCI_TW_INFO_TYPE_WDATA = 0x02000000 let PCI_TW_INFO_ENABLE_MASK = 0x0000000f let PCI_TW_RECEIVE = _PCI_B + 0x0b ; deq - address or written data ;; Target Read let PCI_TR_PENDING = _PCI_B + 0x0c ; in - nonzero if pending read let PCI_TR_RECEIVE_ADDR = _PCI_B + 0x0d ; deq - the one address that is pending let PCI_TR_SEND_DATA = _PCI_B + 0x0e ; enq - where we write the one word ;; Memory Ports ;; ============ let _MEM_B = -0x40 ; base address for memory ports let MEM_CMDQ_FREE = _MEM_B + 0x00 ; in - free slots in command pipe let MEM_SEND_ADDR_MEM = _MEM_B + 0x08 ; enq - address for memory read or write let MEM_SEND_ADDR_ENG = _MEM_B + 0x09 ; enq - address for engine read or write let MEM_SEND_READ_COUNT = _MEM_B + 0x0f ; enq - do a read of given word count let MEM_SEND_DATA_0000 = _MEM_B + 0x10 ; enq - 16 ports for data stream, where ; lower 4 address bits are enables let MEM_SEND_DATA_1111 = MEM_SEND_DATA_0000 + 15 let MEM_READQ_AVAIL = _MEM_B + 0x02 ; in - number of words in FIFO let MEM_RECEIVE_DATA = _MEM_B + 0x03 ; deq - data stream from memory ;; [1] http://readlist.com/lists/duskglow.com/open-graphics/1/6252.html. From theosib at gmail.com Thu Jul 24 14:54:40 2008 From: theosib at gmail.com (Timothy Normand Miller) Date: Thu Jul 24 14:54:55 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <20080724181545.GA11533@eideticdew.org> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> <20080723174933.GB8580@eideticdew.org> <9871ee5f0807231118k4f396a69m3cc6ec62830798b4@mail.gmail.com> <20080723184239.GA8842@eideticdew.org> <9871ee5f0807231245h51688845re4e66102d21737b5@mail.gmail.com> <20080724181545.GA11533@eideticdew.org> Message-ID: <9871ee5f0807241154lf31b53dp7f639b55cdda4342@mail.gmail.com> On Thu, Jul 24, 2008 at 2:15 PM, Petter Urkedal wrote: > >> Something I just realized: We need an enable/disable for the data >> cache in the address decoder. For memory accesses, it has a 16-word >> cache that we need to bypass when HQ is running. Sometimes we want to >> leave it on. For instance, in text mode, it doesn't hurt anything. > > As far as I can see from pci_address_decode.v, this is used for target > reads, right? So, as long as reads are pure, it doesn't hurt to leave > it on. Of course, HQ can always yield some side-effect on read, but is > that useful? If not, we could maybe reformulate the target read ports > of HQ to allow transmitting more than a single word, and thus benefit > from the cache. That's the issue. If there were ever a side-effect on a read, we would need to put the address decode in single-word mode. I think there may be side effects for VGA graphics. >> Another way to do it is to encode the target into the port, like we >> did with byte enables. There are only two (maybe three, more on this >> later) targets that can be accessed via the bridge, memory and engine. >> So we just need two address ports: >> MEM_SEND_ADDR_MEM >> MEM_SEND_ADDR_ENG > > Done, and we just leave space for a third port. > >> Everything else stays the same. > > Well, I wasn't quite satisfied, but we can go back to the previous > version, if you like. I assumed the master command queue is the same as > the master write queue, so that the _FREE ports should be the same. > Thus, I did something similar to the memory port, including the names. > Now, all enqueuing ports are named _SEND and all dequeuing ports are > named _RECEIVE. What were you not satisfied? Previous version of what? Also, I'm not thinking much about PCI master right now, but I was thinking we should have three queues. Command, read data, and write data. The main reason to separate write data from commands is that we can now somewhat independently control write data generation and write address generation. We'll have a short counter for the address, and we need to track the address so the master can restart on terminate. There's also a lot of latency between issuing a write command (with count known in advance) and the data actually being consumed. The thing about the PCI master is that the writes aren't "reliable" in that we can just issue them and forget. The bus protocol allows the other target a lot of control. This is in contract to our bridge which is entirely a slave device, so there's no need for the extra flexibility. If this doesn't make sense, ask, and I'll try to explain better. -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open Graphics Project From urkedal at nbi.dk Thu Jul 24 16:31:30 2008 From: urkedal at nbi.dk (Petter Urkedal) Date: Thu Jul 24 16:31:44 2008 Subject: [Open-graphics] VGA text mode C version In-Reply-To: <9871ee5f0807241154lf31b53dp7f639b55cdda4342@mail.gmail.com> References: <9871ee5f0807201632r4cd860a4x48128677d7c2fb4e@mail.gmail.com> <20080723065419.GA7424@eideticdew.org> <9871ee5f0807230748s2c779befx39d3d1c358e3b358@mail.gmail.com> <20080723174933.GB8580@eideticdew.org> <9871ee5f0807231118k4f396a69m3cc6ec62830798b4@mail.gmail.com> <20080723184239.GA8842@eideticdew.org> <9871ee5f0807231245h51688845re4e66102d21737b5@mail.gmail.com> <20080724181545.GA11533@eideticdew.org> <9871ee5f0807241154lf31b53dp7f639b55cdda4342@mail.gmail.com> Message-ID: <20080724203130.GA16627@eideticdew.org> On 2008-07-24, Timothy Normand Miller wrote: > That's the issue. If there were ever a side-effect on a read, we > would need to put the address decode in single-word mode. I think > there may be side effects for VGA graphics. Okay. > What were you not satisfied? Previous version of what? That would be my own hqio.asm attachment from yesterday. > Also, I'm not thinking much about PCI master right now, but I was > thinking we should have three queues. Command, read data, and write > data. The main reason to separate write data from commands is that we > can now somewhat independently control write data generation and write > address generation. We'll have a short counter for the address, and > we need to track the address so the master can restart on terminate. > There's also a lot of latency between issuing a write command (with > count known in advance) and the data actually being consumed. The > thing about the PCI master is that the writes aren't "reliable" in > that we can just issue them and forget. The bus protocol allows the > other target a lot of control. This is in contract to our bridge > which is entirely a slave device, so there's no need for the extra > flexibility. If this doesn't make sense, ask, and I'll try to explain > better. I see the point, thanks. I may ask about details later, but this should suffice for the port definitions. So, I reverted the _FREE port unifications for the master and some of naming. Should I commit hqio.asm, or do you prefer to wait till nail down the port numbers (i.e. after the Verilog is written)? -------------- next part -------------- ;; PCI Target Numbers let PCI_TARGET_CFG = 0 ; (not seed by HQ) configuration let PCI_TARGET_ENG = 1 ; engine let PCI_TARGET_MEM = 2 ; memory let PCI_TARGET_PROM = 3 ; (not seen by HQ) let PCI_TARGET_IO = 4 ; IO ;; PCI Ports ;; ========= let _PCI_B = -0x20 ; base address for PCI ports ;; Master Command let PCI_M_CMD_FREE = _PCI_B + 0x00 ; in - number of commands we can write let PCI_M_CMD = _PCI_B + 0x01 ; enq - direction and count ;; Master Write let PCI_MW_FREE = _PCI_B + 0x02 ;