0007
0012
0001
0045
0074
0246
0368
0471
0520
0573
0606
0668
0699
0722

                           Programming Tips & Tricks

                                    Issue 1

                             Edited by Tenie Remmel

                                  Introduction
                                   Disclaimer
                        VGA Mode 13h programming, part 2
                             Using Decimal Counters
                               Size Optimization
                              Optimization Contest
                                   Tech Hints
                             Letters to the Editor
                                 Advertisements
                               Please Contribute!
                                    Credits
                                     Index
~
                                                
                     Ŀ Ŀ    Ŀ   Ŀ 
                                           
                                     
~



  ~This is issue 2 of PROGRAMMING TIPS & TRICKS, a low-level programming
   magazine (possibly the only FREE one in existence!)~

  Hopefully, if we can get enough contributions, new issues will be released
  every month.  You are encouraged to contribute anything possible, even if
  it's only to say that you liked it (or didn't).  If nobody helps, we cannot
  continue to publish this magazine.  Also, if you have any suggestions for
  articles that you would like to see, please send them in!  If you don't, we
  have no way of knowing what you are interested in.

  You can have this magazine sent via mail on a 3 inch floppy disk, a single
  issue for $5.00 (back issues are available), or a one year subscription
  (up to 12 issues) for $30.00.  All fees are non-refundable.

  Again, ~WE NEED CONTRIBUTIONS!!~  Send in your code, articles, ideas, hints,
  letters, or just about anything else to one of the following addresses:

    E-mail:
~
        tjr19@mail.idt.net
~
    Snail mail:
~
        Tenie Remmel, Editor
        6709 151st Avenue NE
        Redmond, WA 98052
~








~
                                     
                         Ŀ Ŀ     Ŀ Ŀ
                         Ŀ           
                                 
~



  All information in Programming Tips & Tricks is provided AS-IS without any
  warranty, express or implied, including but not limited to the warranty of
  fitness for a particular purpose.   The editor (Tenie Remmel), and authors
  of the individual articles are not responsible in any way for any possible
  (or impossible) damages caused by the use, abuse, or misuse of any and all
  of the information provided in this publication, or for losses to business
  sales or profits caused by competition by the information herein.

  Programming Tips & Tricks does not discriminate or grant preference on the
  basis of  age, disability, education, ethnicity, gender, income, location,
  marital status, national origin, race, religion, or sexual orientation.









~
                   Ŀ                           
                             Ŀ  Ŀ       Ŀ
                                           
                                      
                                                     
                 Ŀ Ŀ Ŀ  Ŀ      
                                        
                                    
                                                         
~
                                  Tenie Remmel




  This is part 2 of a series of articles about programming in the VGA video
  mode 13h.  In part 1 we learned how to draw pixels and lines.  This time,
  I discuss how to draw rectangles and circles.   All the old code is still
  available in ~VGADEMO.PAS~.

  An unfilled rectangle is trivial, as it is made up of four lines:

~
    procedure rectangle(x1, y1, x2, y2, color:integer);
    begin
        line(x1, y1, x2, y1, color);
        line(x1, y2, x2, y2, color);
        line(x1, y1, x1, y2, color);
        line(x2, y1, x2, y1, color);
    end;
~

  A filled rectangle could be drawn with two loops to draw all pixels in
  the rectangle, but calling the putpixel procedure would be slow.  It could
  also be drawn using the line procedure to draw vertical or horizontal lines
  next to each other:

-~
    procedure fillrect(x1, y1, x2, y2, color:integer);
    var
        i:integer;
    begin
        if y1 > y2 then begin   ~{ Put Y in order }~
            i := y1; y1 := y2; y2 := i;
        end;
        for i := y1 to y2 do line(x1, i, x2, i, color);
    end;

~

  However, since lines are drawn a pixel at a time it would still be slow.
  The solution is to use a separate procedure for drawing a horizontal line,
  as this can be done using ~REP STOSB~ and ~REP STOSW~ (here I omit the
  clipping code, it is included in the version in ~VGADEMO.PAS~):

~
    procedure putrow(x1, x2, y, color:integer); assembler;
    asm
        mov ax,0A000h;          ~{ ES = video memory }~
        mov es,ax;

        mov ax,y;               ~{ Get parameters }~
        mov bx,x1;
        mov cx,x2;

        imul di,ax,320;         ~{ DI = offset }~
        add di,bx;
        sub cx,bx; inc cx;      ~{ CX = length }~

        mov ax,color;           ~{ AL, AH = color }~
        mov ah,al;
        shr cx,1; rep stosw;    ~{ Store by words }~
        adc cx,0; rep stosb;    ~{ Store possible odd byte }~
    end;
~

  Now we simply use the original rectangle routine except that it calls
  putrow instead of line:

~
    procedure fillrect(x1, y1, x2, y2, color:integer);
    var
        i:integer;
    begin
        if y1 > y2 then begin   ~{ Put Y in order }~
            i := y1; y1 := y2; y2 := i;
        end;
        for i := y1 to y2 do putrow(x1, x2, i, color);
    end;
~

  The equation for a circle is  ~X+Y = R~.   This equation is difficult to
  use directly, so we can transpose it, as  ~Y = Sqrt(R - X)~.  However, as
  this requires a square root operation, it would be too slow for practical
  use.   The solution involves an ingenious method of stepping around the
  circle pixel by pixel:
~
    X := X + Y / R;
    Y := Y - X / R;
~
  Note that the second operation refers to the ~new X value~ as calculated by
  the first one.   This simple operation, when applied to any point on the
  circle, advances precisely to the next pixel!  This moves clockwise if in
  Cartesian coordinates, or counter clockwise if in screen coordinates. This
  algorithm was apparently discovered by Tylisha C. Andersen.

  The simplest way to implement this would be to use floating point numbers,
  but this would again be impractical.   The solution is to use fixed point.
  Here I use ~10.6 bit fixed point numbers~, which allow a radius of up to 500
  and retain enough precision to produce a round circle.   Here is the first
  circle procedure:

~
    procedure circle(x, y, r, color:integer);
    var
        i, ix, iy:integer;
    begin
        if r < 1 then r := 1;
        ix := 0; iy := r * 64;
        for i := 1 to (r * 44 div 7 ~{2 Pi}~) do begin
            putpixel(x + (ix div 64), y + (iy div 64), color);
            ix := ix + iy div r;    ~{ Step to next pixel }~
            iy := iy - ix div r;
        end;
    end;
~

  However, this still performs one calculation for each pixel.  A circle is
  symmetrical, so we only need to calculate one of every eight pixels; the
  other eight can be plotted using symmetry.  Also, there is now no reason
  to calculate the length of the circle, as when plotting an octant (1/8 of
  a circle) we can simply stop when X is equal to Y.  Here is the final
  optimized circle routine:

~
    procedure circle(x, y, r, color:integer);
    var
        ix, iy, a, b:integer;
    begin
        if r < 1 then r := 1;
        ix := 0; iy := r * 64;

        repeat begin
            a := (ix + 32) shr 6; b := (iy + 32) shr 6;
~
            { plot eight pixels using symmetry }
~
            putpixel(x + a, y + b, color); putpixel(x - a, y + b, color);
            putpixel(x + a, y - b, color); putpixel(x - a, y - b, color);
            putpixel(x + b, y + a, color); putpixel(x - b, y + a, color);
            putpixel(x + b, y - a, color); putpixel(x - b, y - a, color);
~
            { step to next pixel }
~
            ix := ix + iy div r;
            iy := iy - ix div r;
        end until b <= a;
    end;
~

  All of this code is included in the file ~VGADEMO.PAS~.  Run ~VGADEMO.EXE~ to
  see these routines in action.  A 'C' translation is in the file ~VGADEMO.C~.









~
                                         Ŀ            
      Ŀ       ĿĿ        Ŀ    ĿĿĿ
      Ŀ  ۳                             Ŀ
                            
                 
~
                                   Jim Neil




     Maintaining and incrementing counters, particularly when their only
  use is for display purposes, can be a CPU intensive function.  Keeping
  them in binary can create two problems: 1) a great deal of time will be
  spent converting the counter from binary to decimal ASCII, and 2) you
  may be limited by the word width of the machine.  When I wrote the TERSE
  compiler, I was faced with just this situation.  I wanted the line
  numbers to run from 000001 to 999999.  Since the 8088 only supported 16
  bit math, I would have had to use double words to contain these values.
  Add the conversion times required to convert these double words to ASCII
  for ~each~ line and the overhead would have been significant.

     I chose to maintain the counters in decimal ASCII.  The x86 provides
  special instructions that make this quite simple.  The two benefits are:
  the counter may be as long as you like, and no conversion is required.

     Here is a routine that will increment a decimal counter of arbitrary
  length.  The counter is declared in two separate 'pieces' to make the
  calling sequence simpler.  Below is a sample declaration of a 6 digit
  counter (note the value in the Dup could be a space if leading spaces
  were desired).  A sample call follows.  Note the use of the Length
  operator to load the counter's length - 1.  Finally, the procedure
  IncLineNum is shown.

~
  Counter db 5 Dup ('0'),'0';     ~leading zeros~

    Mov   bx,Offset Counter;      ~bx = base address of number.~
    Mov   si,Length Counter;      ~si = length of number.~
    Call  IncLineNum;             ~bump the line number.~

  IncLineNum Proc Near;           ~bx = ptr to str, si = length - 1.~

    Mov   al,bx[si];              ~al = next digit.~
    Inc   al;                     ~propagate carry through number.~
    Aaa;                          ~adjust for ASCII arithmetic.~
    Lahf;                         ~save the current CY.~
    Or    al,'0';                 ~convert to ASCII.~
    Mov   bx[si],al;              ~store updated digit.~
    Sahf;                         ~get back the old CY.~
    Dec   si;                     ~point to next significant digit.~
    Jnc   DoneWithThisLineNum;    ~jump if no more to propagate.~
    Jns   IncLineNum;             ~loop till done...~

  DoneWithThisLineNum:
    Ret;                          ~return.........................

  IncLineNum EndP;
~

     The procedure is entered with the offset of the counter in ~bx~.
  Register ~si~ contains the length of the counter in bytes ~minus one~.

     The loop loads the next digit and increments it.  Then the ~Aaa~ (ASCII
  Adjust for Addition) instruction is used to 'correct' the value when it
  exceeds nine.  The ~Lahf~ instruction is used to save the state of the
  carry flag.  The value in the accumulator is converted back to ASCII.
  This necessary because the ~Aaa~ clears the high order nibble.  The flags
  are restored with the ~Sahf~ instruction.  The length register (~si~) is
  decremented, and the loop is exited if there is no carry pending (set by
  the Aaa).  Finally, a ~Jns~ (Jump No Sign) is used to loop back if there
  are more digits left to process.  The ~Dec~ instruction does not effect
  the Carry Flag and the ~Jns~ jumps while the counter is positive or zero.
  This allows a maximum theoretical counter length of 32769 digits.

     To ~decrement~ a decimal counter, simply replace the second and third
  line of the IncLineNumLoop with:

~
    Dec   al;                     ~propagate borrow through number.~
    Aas;                          ~adjust for ASCII arithmetic.~
~

     This same routine is shown below in TERSE.  Also, there are sample
  programs (~DECMATH.ASM, DECMATH.T~) included in both standard ASM as well
  as TERSE.  These programs create a .COM file and show counters with both
  leading zeros as well as leading spaces.

~
  IncLineNum Proc Near;           ~\ bx = ptr to str, si = length - 1.~
    {                             ~\ for each digit...~
      al = [bx][si]+;             ~\ propagate carry through number.~
      "+; = ?; al | '0';          ~\ ASCII adj, ah = flags, make it ASCII.~
      [bx][si] = al; ? =;         ~\ save digit, flags = ah.~
      si-; =>>;                   ~\ next digit index, break if no carry.~
    }++;                          ~\ loop until all digits are processed.~
    .=;                           ~\ return.........................~
  IncLineNum EndP;                ~\ end proc to inc line number.~
~

     The savings achieved by maintaining counters in this manner are
  considerable.  Only one pass through the loop is required 90% of the
  time, two passes are required 9% of the time, three passes are required
  only .9% of the time, etc.  As you can see, the overhead required for
  longer counters is very insignificant.

     These instructions and similar techniques can be used to add or
  subtract decimal ASCII numbers directly without converting them to/from
  binary.  I've use this method for maintaining scores for video games.
  Banks use decimal math for processing account information, which is why
  these instructions are in the CPU in the first place.  I'll leave these
  implementations as an exercise for the reader.









~
          Ŀ             Ŀ                           
          Ŀ  Ŀ Ŀ      Ŀ     Ŀ    Ŀ 
                                        
                                   
                               
~
                              Tylisha C. Andersen




  A widely neglected form of optimization is size optimization, the process of
  shrinking a program's size.  Most of the time when optimization is discussed
  it is about speed optimization, making things faster -- often at the expense
  of size.  Most programs can be reduced in size by 90% or more, especially if
  they were originally written in a high level language such as 'C'.

  This article continues the Size Optimization column started in issue 1.


~
  4.  ONE-BYTE OPCODES
~
    It is widely known that the  ~XCHG AX,r16~  opcodes are a single byte.  This
    is not the only useful one-byte opcode.  If you need to add two to DI, and
    the result in the flags does not matter, use a ~SCASW~ (which compares AX to
    [ES:DI] and adds two to DI).   Likewise, on an 80386, a  ~SCASD~ can be used
    to add 4 to DI. To add 2 or 4 to ~both SI and DI~, use ~CMPSW~ or ~CMPSD~.  Note
    that these cannot be used if there is any possibility of a carry, as a GPF
    would result from the memory access.

    To access segment registers, ~PUSH~ and ~POP~ are very useful.   For instance,
    instead of ~MOV AX,DS~/~MOV ES,AX~ (4 bytes), use ~PUSH DS~/~POP ES~ (2 bytes).
    And if you need to move a constant such as 0A000h into a segment register,
    this is also useful:  ~MOV AX,0A000h~/~MOV ES,AX~  is one byte longer than the
    equivalent ~PUSH 0A000h~/~POP ES~, which is ~4~ bytes.  ~This saves an additional
    byte~ if the constant is less than 128 (such as 40h, the BIOS segment).

~
  5.  ACCESSING MEMORY VARIABLES
~
    If you need to initalize a bunch of memory variables to zero, and they are
    not contiguous (so that a ~REP STOSB~ won't work), then zero AX and set each
    variable using AL or AX.  This always saves space -- even if there is only
    one variable:  ~MOV mem16,AX~ is ~3~ bytes, while ~MOV mem16,0~ is ~6~ bytes.
    For instance, this is an actual example from a screen package:
~
        mov   color, 7      ~; ~29~ bytes~
        mov   winX1, 0
        mov   winY1, 0
        mov   winX2, 79
        mov   winY2, 24


        xor   ax, ax        ~; ~23~ bytes~
        mov   color, 7
        mov   winX1, ax
        mov   winY1, ax
        mov   al, 79
        mov   winX2, ax
        mov   al, 24
        mov   winY2, ax
~
~
  6.  USING 32-BIT REGISTERS
~
    If you need to load a small nonzero value into a 32-bit register, then you
    can reduce the size by ~1~-~2~ bytes, by splitting the load operation into two
    separate pieces.   There are three cases, see below  (the examples use ~EAX~
    but any register can be substituted):
~
        mov   eax, (value)  ~; normal move              ~6~ bytes~

        xor   eax, eax      ~; eax = 1                  ~4~ bytes~
        inc   ax

        xor   eax, eax      ~; eax = -1                 ~4~ bytes~
        dec   ax

        mov   ax, (value)   ~; eax = signed word        ~5~ bytes~
        cwde
~
    Also, to access the high word of a 32-bit register, use a SHLD instruction
    instead of ROR or SHR, it is 2-5 bytes shorter:
~
        shld  ebx, eax, 16  ~; bx = high word of eax    ~5~ bytes~

        ror   eax, 16       ~; bx = high word of eax    ~10~ bytes~
        mov   bx, ax        ~; ~6~ bytes without restoring eax~
        ror   eax, 16

        mov   ebx, eax      ~; bx = high word of eax    ~7~ bytes~
        shr   ebx, 16
~








~
   Ŀ                                       Ŀ                     
     Ŀ     Ŀ    Ŀ        Ŀ   Ŀ Ŀ 
                                       Ŀ  
                                   
       
~



  This is an optimization contest.  The object is to produce the smallest
  version;  entries are accepted in assembly language, TERSE, Pascal, or C.
  Usually assembly language will be the best choice, as it is difficult to
  produce very small programs in high-level languages.   Contest entries must
  work substantially the same way as the original, and you are allowed to use
  the 80386 instruction set.

  The deadline for entries is the publication date of the next issue;  this
  will not be less than three weeks.  The prize for this contest is a copy
  of the ~TERSE compiler~ by Jim Neil, ~a $49 value!~

  This issue's program is a PLASMA display; it can be found in the file
  ~CONTEST\PLASMA.ASM~.  The executable (559 bytes) is ~CONTEST\PLASMA.COM~.

  Winning entries will be announced on the newsgroups ~comp.lang.asm.x86~ and
  ~alt.lang.asm~, and will be notified by E-mail if an address is available,
  otherwise by snail mail.

  Contest entries may be submitted by E-mail or snail mail, at the following
  addresses:

    E-mail:
~
        tjr19@mail.idt.net
~
    Snail mail:
~
        Tenie Remmel, Editor
        6709 151st Avenue NE
        Redmond, WA 98052
~








~
                                         
                     Ŀ Ŀ Ŀ    Ĵ    Ŀ
                                      Ŀ
                                    
~



  These are routines, ideas, and other items that are too small to make a whole
  article out of, but too good to leave out.  They are not edited except to fix
  obvious mistakes.


  ~Clearing memory faster than REP STOSD~          - J. Martinez

    The REP STOSD instruction requires 4 ticks per dword or 1 tick per byte.
    This can be reduced significantly (to about 0.66 tick/byte) by unrolling
    the loop, as follows:


; Replacement for REP STOSD = 29-41 + 2.625*CX ticks
~
FILLMEM:PUSH DS
        PUSH ES
        POP DS
        PUSH CX
        SHR CX,3
FIL_1:  MOV [DI+00],EAX
        MOV [DI+04],EAX
        MOV [DI+08],EAX
        MOV [DI+12],EAX
        MOV [DI+16],EAX
        MOV [DI+20],EAX
        MOV [DI+24],EAX
        MOV [DI+28],EAX
        ADD DI,32
        DEC CX
        JNZ FIL_1
        POP CX
        AND CX,7
        REP STOSD
        POP DS
        RET
~








~
                                                          
    Ŀ   Ŀ Ŀ Ŀ     Ŀ     Ŀ Ŀ         Ŀ Ŀ
               Ŀ                                
                                        
~



  In later issues this space will contain letters written by readers of this
  magazine.

  ~WE NEED CONTRIBUTIONS!!~  Send in your code, articles, ideas, hints, letters,
  or just about anything else to one of the following addresses:

    E-mail:
~
        tjr19@mail.idt.net
~
    Snail mail:
~
        Tenie Remmel, Editor
        6709 151st Avenue NE
        Redmond, WA 98052
~








~
                                                       
                Ŀ Ŀ   Ŀ Ŀ  Ŀ   Ŀ
                          Ŀ           Ŀ
                                   
~



                                 
                                        
                               
 
 ~Over 200 optimized assembly language routines that do everything from buffered
 file I/O to VGA graphics.  Size averages only 60 bytes per procedure!   Source
 code is included for all routines.   FREELIB is public domain software, and is
 free for all non-commercial use.   If you would like to use it in a commercial
 product, licenses are available for $30.~
 > ftp://ftp.simtel.net/pub/simtelnet/msdos/asmutl/freeli30.zip <



~  ___  ___/  ____/   ___  /   ____/   ____/~   ~TERSE~ Simplifies x86 Assembly!
~      /     /       /    /   /       /    ~     Visit: ~http://www.terse.com~
~     /     ___/      ___/ ____  /   ___/ ~      Email: ~jim-neil@digital.net~
~    /     /       /  \         /   /    ~     Only $49 + S&H Credit Cards Only
~ __/   ______/ __/  __\ ______/ ______/~ TM     Call Toll Free ~800-881-8700~




  Advertisements placed in this magazine can reach an audience of at least
  300 people, possibly up to 1000 or more.  Our advertising rates are:

  For individuals                           $2 per line,  2 lines minimum.
  For corporations under 20 employees       $10 per line, 3 lines minimum.
  For corporations over 20 employees        $25 per line, 3 lines minimum.

  Advertising fees must be paid in U.S. money drawn on a U.S. bank account.
  All advertisements will be converted to plain text with highlighting where
  bold text is found.  They may be submitted on paper, or by E-mail in either
  plain ASCII, or WordPerfect 6.0 compatible format, to the following address:


    E-mail:
~
        tjr19@mail.idt.net
~
    Snail mail:
~
        Tenie Remmel, Editor
        6709 151st Avenue NE
        Redmond, WA 98052
~








~
                  Ŀ                           
                     Ŀ   Ŀ  Ŀ      
                                           
                                     
                                                           
~



  ~WE NEED CONTRIBUTIONS!!~  Send in your code, articles, ideas, hints, letters,
  or just about anything else to one of the following addresses:

    E-mail:
~
        tjr19@mail.idt.net
~
    Snail mail:
~
        Tenie Remmel, Editor
        6709 151st Avenue NE
        Redmond, WA 98052
~








~
                           Ŀ              
                              Ŀ Ŀ    Ŀ
                                        Ŀ
                                     
~



  The following people contributed to this issue:

        Tylisha C. Andersen
        J. Martinez
        Jim Neil









~
                                      
                                  Ŀ  
                                      
                                     
~



  Binary to Hex Conversion . . . . . . . . . . . . . . . . Issue 1
  Size Optimization 1-3  . . . . . . . . . . . . . . . . . Issue 1
  Size Optimization 4-6  . . . . . . . . . . . . . . . . . Issue 2
  Using Decimal Counters . . . . . . . . . . . . . . . . . Issue 2
  VGA mode 13h programming, part 1 . . . . . . . . . . . . Issue 1
  VGA mode 13h programming, part 2 . . . . . . . . . . . . Issue 2

  Hint: Clearing memory faster than REP STOSD  . . . . . . Issue 2
  Hint: Fast text writing in 16-color modes  . . . . . . . Issue 1

  Review: the TERSE compiler . . . . . . . . . . . . . . . Issue 1
