11.15.08
Faster Sprites
I’ve spent the last few posts talking about the core blitting mechanism of GTE, but I have not given many details about how the sprite subsystem has evolved. I had previously proposed a change whereby any line with a sprite would be completely drawn with Shadowing disabled, the sprites drawn and then Shadowing turned on again and the line exposed via a PEI update.
The problem is that the worst-case performance happens in all cases. As long as there is a single sprite on a given line, we pay the penalty of having to move the data for that line twice. Now, to be fair, the overhead is very small since we are not trying to be clever. Breaking up the line and only drawing segments requires a round-trip through the blitting code — at least 50 cycles of overhead. But, just drawing each line blindly takes several hundred cycles, so there is the potential to speed things up as long as we do it quickly.
It turns out that there is a way to proceed very efficiently thanks to the TSB and TRB instructions.
Consider breaking the SHR screen up into a few vertical slices where each column corresponds to a bit in a byte. If a sprite is present in any of these columns, set the corresponding bit equal to 1. Otherwise set it to zero. If the screen is broken up into N columns, then there are a total of N(N+1)/2 possible bit patterns that describe a single sprite.
The blitter is required to scan the OAM during each update to check for any active sprites and, if some are present, to clip them to the screen and queue them up for drawing. We can use the information that is calculated for clipping the sprites to efficiently update the screen. In pseudocode:
for each sprite, clip the sprite to the screen let xlow, xhigh, ylow and yhigh be the clipped bounding box compute the bitmask from [xlow, xhigh] set tsb_table[ylow] |= bitmask set trb_table[yhigh+1] |= bitmask
We also keep a little bit of extra information to know how many scan lines are between each sprite. This is necessary to batch up the drawing of our lines.
Now, within the blitter, we simply maintain a bitmask and dispatch to an appropriate subroutine
stz mask ; mask starts at zero ldx first_y lda tsb_table,x ; set the bits for the columns with a sprite tsb mask lda trb_table,x ; clear the bits if a sprite just left trb mask stz tsb_table,x ; clear the table entries stz trb_table,x tax jmp (disp,x) ...
For K sprites, there are, at most, 2*K values in the table, so the above code is not executed very often at all. The tricky bit (ha!) is determining how many bits to use. More bits means a finer granularity, but that could lead to excessive segmentation.
I think I am going to try the method out with just four segments. This might seem too coarse, but this method is easily extensible, so it will be no trouble to make the segmentation more fine grained.