|
*note* I'm no longer working on this library, but I still get a surprising number of emails from interested parties, so I'll leave this page in place. Send me an email if you'd like the source to it! Finishing my thesis among other things has slowed me down a bit, so this
section isn't fully finished. I'll keep adding bit by bit until its done.
So far I haven't even gotten to the good stuff...so far I've just rambled
a little about other peoples code. That in mind, welcome to my programming
notes.
The LUT is generated using a constant alpha and all 256*256 combinations of source (SRC) and destination (DST). (In most cases the background image will be placed in the destination, the pixels will be blended, and placed back in DST.) The most common alpha blending equation is Eq 1:DST=SRC*alpha+DST*(1-alpha) The drawback of the DDTRANS lookup table is that is only suitable
for 256 color modes or lower. The lookup table index is something along
the lines of SRC+256*DST. This LUT is only 64k for 256 colors, but become
prohibitively large for higher color modes, and this is only for one level
of alpha. To get nice smooth blending we want to use those higher color
modes. Of course, if your application only uses 256 colors, this may well
be a very good implementation. Now that I've seen this in action, I've
noticed its use in several 256 games.
rrrrrggggggbbbbb The zero bit in the first format is ussually unused, but I have
heard it can be used as an alpha bit. He performs his alpha blending by
right shifting the SRC and DST, masking them, then adding them together.
(basically the eq 1 with alpha=.5) In 565 the shift and mask look like
this in psuedo-assembly
-> 0rrrr0ggggg0bbbb This simultaneously divides the red, green, and blue by two. This
is really fast, and although there are some optimizations that can be
performed, I haven't seen a faster algorithm for a quick and easy alpha
blend. Two shifts, two ands, and an add gives nice blending and very low
cost. One disadvantage to the method is it does not give the most accurate
blending possible. The shr does not exactly divide by two since the low
order byte is dropped. This might seem like a small point, but the difference
can be noticable.My initial alpha blending code was an extension of this technique to varied levels of alpha. My thoughts were that shifts, ands, adds and subtracts are cheap. Using combinations of these you can get all sorts of nifty fractions, for example to get 7/8 of ax
sub ax, bx Using just 4 shifts you can do any fraction of 16, leading to
16 levels of alpha. What I did is set up 4 bitmasks before the assembly
section, then did 4 shifts, applied the bitmasks, and subtracted. Any
bitmask which was unneeded would be set to 0, thus subtracting zero at
that stage. Some shifts and subtracts are wasted at some alpha levels.
For example, at 50 percent three of the bitmasks are zero. For the fastest
possible implementation of this method, each alpha level should have its
own hard coded routine. Here is a snippit of source from the generic routine,
showing the 4 subtracts for the sprite pixel.
sub ebx, eax
Gil Gribb-answered questions Vinnie Falco-dxblt code example Darius Zolna-blittest code example Brian Shea-learned lots of good directx from him Stuart Riffle-Taught me the black art of assembly language Phillip Evans-Introduced me to VC++ |
||||||
LE FastCounter |
© 2001kristian olivero |