04.07.08

Basic Block BitBlit Babbling

Posted in Apple ][ at 5:01 pm by site admin

Well, it’s happened again. I’ve become so totally swamped with Real Life that any work on GTE has been pushed to the rear of the stove. Even though I haven’t done any coding in several weeks, I’d like to describe what the next phase of development holds.

Currently, the bare bones of GTE are working, but many of the supported features do not have the proper low-level support. This support takes the form of a myriad of custom tile bitblit functions. A different bitblit is required for all the different combinations of BG0, BG1, Fringe and Animated tiles. Also, based on the capability bits of the tool set, the same bitblit routine can be optimized in different ways. Plus, a separate routine is required for each tile size and orientation.

For example, the simplest block blit (as in, the only one currently implemented), copies the tile data, without masking, directly into the operand field of the PEA instructions that blit the BG0 field to the graphics screen. Simple, straightforward and fast. Now consider what happens if Animated Tile support is activated. In this case, some of the PEA opcodes are replaced with LDA/PHA instruction pairs which copy data from the animated tile data buffers to the screen. Unfortunately, this means that the opcodes must be reset to PEA instructions when copying a regular tile to the buffer. Hence, tile blitting is about 40% slower in this case.

Other feature require similar considerations. Activating BG1 requires that the tile mask data be evaluated in order to select between PEA and LDA/PHA instructions on a per-word basis. If the Fringe layers are active, then there is yet another set of masks and data to consider. Of course, the more complicated blits require to most optimization in order to keep things fast.

So, rather than manually code the multitude of blitting routines, I’m planning to write a bitblit generator to create the blitters for me. While this is just as much work as writing the code manually, it will be much easier to change things and I can ensure the all the blit routines are always up to date.

I’d like to finish with an short example that illustrates the difference in complexity between the simplest blitter and the most complex. The tile data and masks are stored in LocInfo records, thus each tile takes up 8 bytes per row. This is a bit wasteful for 8×8 and 4×4 tiles, but it helps maintain data structure compatibility with QuickDraw II. I’ll assume that there are four pointer to the tile data and mask, and the fringe data and mask named dptr, mptr, fdptr, and fmptr respectively.

; This copies data directly to the BG0 code buffer.  Assume that the x register contains the
; proper address.
simple8x8 anop
          lda   [dptr]
          sta   |$0001,x
          ldy   #2
          lda   [dptr],y
          sta   |$0004,x          ; finish row 0

          ldy   #8
          lda   [dptr],y
          sta   |BG0_STRIDE+$0001,x
          iny
          iny
          lda   [dptr],y
          sta   |BG0_STRIDE+$0004,x          ; finish row 1

          ...

; This merges BG0, BG1 and Fringe data together.  Very slow....
complex8x8 anop
          lda   [mptr]          ; combine the tile and fringe mask data
          and   [fmptr]
          inc                   ; Is it totally transparent (0xFFFF)?
          bne   is_solid

          lda   #LDA_OPCODE     ; show BG1
          sta   |0,x
          lda   operand0        ; load the LDA operand and PHA instruction
          sta   |1,x
          bra   nextWord

is_solid anop
          lda   #PEA_OPCODE
          sta   |0,x

          lda   [dptr]         ; Fill the transparent regions with the background color
          eor   bgColor
          and   [mptr]
          eor   bgColor        ; have the base word, merge with fringe data
          eor   [fdptr]
          and   [fmptr]
          eor   [fdptr]
          sta   |$0001,x       ; save the data
          ...

As you can see, the amount of code needed for the most complex cases is considerable. Also, the large number of logical operations given some hope that this code might be further optimized.

We need four of these routines since we must be able to combine all possible combinations of horizontal and vertical flipping of the Fringe and Base tiles. Lots of work ahead. :)

Leave a Comment

You must be logged in to post a comment.