11.04.08

Bit Twiddles and Depth Masking

Posted in Apple ][ at 9:51 am by site admin

From time to time I like to consider what sort of enhancements I can bolt on to the GTE core to make it just that little bit better. One feature that comes to mind quite often is how to perform, for lack of a better term, depth masking. The effect I have in mind is common to the 3/4 view, top-down games like The Legend Of Zelda for the SNES where the z-sorted order of the sprites is based on their vertical position on the screen.

Here is an image that shows the type of problem I want to avoid.

Z-Sorting Errors

The tree in the center of the image will be drawn on top until the character moves below the trunk. Then the sprite will be drawn on top of the tree. This gives a very convincing illusion of depth, but the transition from “behind” to “in front” cannot be static if the sprite is too tall.

Right now, GTE could be hacked to emulate such an effect by dynamically turning off sprite masking on a per-scanline basis, but I’d like to support a more general mode that does not depend on the “y-position-equals-z” relationship. In short, I want a true z-buffer that selectively masks sprites if their priority is less than the z-buffer value. This would make games like Hogan’s Alley trivial to implement since each region of the background would have its own z-coordinate and the sprites could be chosen to be in any arbitrary layer.

I’m about halfway through developing my approach to this right now. The z-buffer basically replaces either the static or dynamic mask buffer. There is one value per-pixel, so we are limited to 16 z-layers (which should be plenty for the kinds of games one can actually pull off on the IIgs).

The core operation that needs to be integrated into the sprite blitting code is to dynamically create a per-pixel mask via the following code fragment


mask = ( z_sprite < z_buffer ) ? 0x0 : 0xF;

This could work as-is, but would be very slow since the mask would have to be built up pixel by pixels. Everything in GTE is based around moving data one word at a time, so we want to try and develop a SWAR implementation of the above operation.

The fastest way I know of to get a mask based on a comparison on the 65816 is via the following code


lda z_sprite
sec
sbc z_buffer
lda #0
sbc #0

This code sets the carry bit if z_buffer > z_sprite. Then it computes 0 – 0 – carry. If the carry bit is set, we get the value 0xFFFF. If it is clear, we get 0×0000.

Now, how to do this four at a time? Well, we first have to give up one bit to emulate the carry bit. This reduces the number of z-buffer levels to 8. Still plenty. What we will do is assume that each nibble of the z-buffer values has its high bit set (to emulate the SEC instruction). Let’s use an example where a z-buffer word has the four z-values 0, 2, 4, 7 and the sprite z-valie is set to 3 for all four pixels (the sprite can have per-pixel z-sorting, but that distracts from the main point). Out code fragment is as follows:


lda #$8ACF ; z-buffer values with high bit set
sbc #$3333 ; subtract the sprite's z-value

The accumulator now has the value 0×579C. Notice that the high bit of each nibble is set only where the z-buffer is greater than the sprite’s z-coordinate. The next step is to clear the bottom three bits of each nibble and then find a way to fill all the bits with 1 or 0 based on the high order bit. We can’t convert our previous code to SWAR in this case because subtracting zero will not borrow a one from each nibble.

The only solution I can see right now is to shift right three times and them to a SWAR add of 0xF to each nibble. I would prefer there to be some way to get the least significant bit set directly, but three shifts only takes 6 cycles.


and #$8888 ; clear the low-order bits of each nibble
lsr
lsr
lsr

After masking out the bottom three bits, the accumulator contains 0×8800. The three shifts push the MSB into the LSB and the accumulator equals 0×1100. Now, we perform a SWAR add with a fixed value of 0xF, which allows us to simplify the general addition into only two instructions. Note, the carry flag is clear because we shifted a zero bit into the carry flag using the LSR instructions.


adc #$7777
eor #$8888

Now we have our final mask, 0×1100 + 0×7777 = 0×8877 ^ 0×8888 = 0×00FF.

And that’s where the current thought process ends. As far as alternate and faster ways of deriving the mask, I have created a table of all 16 values that need to be added/subtracted from the accumulator to produce the correct mask in all cases, but it seems to be a highly non-linear relationship. Although, since there are only 16, I could use a table, but I don’t have a register free and the indexes are very sparse.

Incorporating my current solution into the full sprite blitting fragment is problematic since it requires an additional load/store to save this derived mask into a temporary location. Ideally, I want to start with this mask value, combine it with any other masks and then I want to apply it directly to the sprite/screen data without having to spill the accumulator to memory. And a pony.

Leave a Comment

You must be logged in to post a comment.