[Open-graphics] Multipliers in oga1hq
Farhan Mohamed Ali
farhan at cmu.edu
Sat Sep 1 17:37:23 EDT 2007
On Sat, September 1, 2007 1:23 pm, Timothy Normand Miller said:
> On 9/1/07, Petter Urkedal <urkedal at nbi.dk> wrote:
>
>> So, let's consider integrating Farhan's version in the nanocontroller.
>>
>
> http://wiki.opengraphics.org/tiki-index.php?page=Subversion+Commit+Polic
> y
>
> Farhan would need to officially give us (Traversal specifically) rights
> to use his work.
>
No problem. How do i do this officially? Just include the copyright and
license statement in my files? I realize that right now i don't have a
proper header and i just have comments all over the place, but changes
are being made quite often so i'm too lazy to write a decent one at the
moment. I will do this once the spec is more or less settled.
>
>> code? (Does DMA require multiply at all, other than powers of 2?)
>
> Doubtful. But if I'm wrong, we maybe should reserve an opcode or two for
> some instruction we don't yet know about.
>
>> I'd go with the non-blocking out-of-band approach. That is, the
>> programmer will count instructions before fetching the result.
>
> I generally prefer this myself.
>
>> As a slight variant, we can hard-code the multiplication result to r31
>> and drop the fetch-product instruction. That's just as easy to
>> implement, and it saves one cycle, since it means the product can be
>> directly used as an operand to the ALU.
>>
>
> I'm not sure we want to add additional MUXing after the REG stage. It
> might be better to move it into the MEM stage. This is especially not a
> problem since we have gobs of time to schedule when the product is
> grabbed.
>
> Having a special instruction to initiate the multiply would save us one
> cycle (worth it?). Otherwise, there would be two moves into the
> scratch/io space. But the product is only a single word fetch. Putting
> it into r31 would save a cycle, because we wouldn't have to move it into
> a register first before using it as an operand to another instruction.
>
> My main concerns are the extra multiplexing logic hurting our max clock
> rate.
>
>> The introduction of interrupts, if needed, will not cause problems as
>> long as interrupt handlers don't use the multiplier. Moreover, if an
>> interrupt handler needs to use the multiplier, this is also possible:
>> When the interrupt handler is sure any pending multiplication is
>> finished, it can save the result R. Then it can do it's own
>> multiplication. Before returning to normal code, it must perform a
>> multiply R*1 and wait long enough for the result to be available.
>
> I think we may in fact need interrupts, and I'm struggling with it. The
> problem is VGA graphics modes. In 640x480x16 and such, framebuffer reads
> and writes are not simple accesses. You can apply raster operators to
> writes, and you can make reads fill a blt buffer larger than your word
> size so that when you write, it causes more than a word size to get
> written out. This way, you can bitblt faster than you can move data over
> the bus.
>
> Now, for VGA mode, mostly what the controller does is read VGA text or
> pixels and convert them in the background into pixels suitable for our
> video controller. At the same time, we want the controller to handle the
> extra smarts of VGA. One way to do this is to support interrupts; when a
> PCI access comes in, we can intercept it and do the extra stuff. While
> writes could be queued for us to process periodically, reads have to be
> processed as soon as possible.
>
> Interrupts won't stall lower parts of the pipeline, but they would divert
> the instruction flow. We need to determine how this will affect our
> static instruction scheduling.
>
> Correct me if I'm wrong, but a subroutine call stores the return address
> into r31, right? Of course, since that's under main program control, no
> problem! But with interrupts, I think we should dump the return address
> into a redefined address in the scratch memory.
>
> What about context switches? Should we require the ISR to copy registers
> to the scratch memory? That's a fair amount of overhead, depending on
> how many we need to clobber. How about doubling the size of the register
> file? The lower half for normal execution, the upper half for
> interrupts. (Like how the Z80 did it.) (In this case, the interrupt
> return address appears in what we might internally call r63.) Oh, and
> don't forget the delayed branch issue and how it'll affect interrupt--one
> extra instruction from the main program will get executed, so the return
> PC must account for that, and be sure to consider the situation where the
> interrupt arrives at the same time as a branch instruction is being
> fetched in the main program.
>
> -- Timothy Normand Miller http://www.cse.ohio-state.edu/~millerti Open
> Graphics Project _______________________________________________
> Open-graphics mailing list Open-graphics at duskglow.com
> http://lists.duskglow.com/mailman/listinfo/open-graphics List service
> provided by Duskglow Consulting, LLC (www.duskglow.com)
>
>
More information about the Open-graphics
mailing list