12.08.08

Pulling Double Duty

Posted in Apple ][ at 11:13 am by site admin

I think it’s always important to actually implement one’s ideas, because the implementation is almost always more instructive than the idea itself.

I had an hour or so to do a little implementing and I encountered a rather tricky problem but was able to come up with an equally tricky solution. :) The core of the problem lies in implementing the BG0/BG1 out-of-line code. The code needs some way of knowing what line it is currently on in order to return to the original call site without using the stack. My original approach was to have a fixed memory location hold the current line, but, with the batch-oriented updates, there is no way to update the index unless I added more inline code.

It turns out that the x-register can serve double duty in this case. It can track the current blitter line as well as serve its original function of indexing into the animated tile data table. The animated tile data is a 4KB buffer that holds tile data that can be replicated across the playing field. If the tile data is changed, every tile is automatically updated. This data is access via a LDA dp,x instruction and the x-register is set to one of the 16 pages of memory at the start of each blitter line.

So, the x-register already distinguishes among the lines modulo 16. How can we extend this range while still limiting access to the tile data? The answer is to interleave the line addresses such that


xreg[n] = 0x100 * (n % 16) + 2 * (n / 16)

This is somewhat counter-intuitive since the “block index” (n / 16) is stored in the least significant bits, but it works. We do lose a bit of range since the first row of tiles must offset their direct page address by 26 in order to compensate for the larger value in the x-register in later lines. All said we lose 4 of the 32 animated tiles — well worth it for the added performance.

And just what do we gain? Well, the out-of-line code sequence before and after the changes is shown below. There may apprear to be optimizations one could make, but the free space in the BG0 code bank is limited.

Before
------

   JMP  ]1        ; jump out of the BG0 code

1: STX  $FFFE     ; save the x-register in BG1 space
   LDX  curr      ; load the current line address

   LDA  (data),y  ; do the data processing
   AND  >mask,x
   ORA  >data,x
   PHA

   TXA
   LDX  $FFFE     ; restore the x-register
   CLC
   ADC  #OFFSET
   STA  >2        ; calculate the address and return
2: JMP  $0000    ; 45 cycles total, 84 * 32 bytes in BG0 code bank

After
-----

   JML  (disp)     ; jump directly to another bank

1: LDA  (data),y  ; do the data processing
   AND  >mask,x   ; interleave the data such that
   ORA  >data,x   ; the x-register is correct
   PHA

   JMP  (rtbl,x)  ; jump to a bunch on JMLs
r: JML  $000000   ; jump directly to the correct location (39 cycles, 0 bytes in BG0 bank)

So it’s faster and is frees up a large chunk of memory in the BG0 bank that we are free to use for a cache later on.

Leave a Comment

You must be logged in to post a comment.