Floating Point Tutorial

Although integer math may seem more natural for us when we program the Z80, floating point math that works with fractional numbers is not much more difficult. The only change is in representation and how we program the handling of the data. After reading this, hopefully the reader will not feel limited to integer math and not limited to even the representations here. Don't just use a 'sign bit' if you think it would be more mathematically sound to have a bit for additive inverses and multiplicative inverses. Add complex arithmetic, or try making Rational representations for sometimes many, many more bits of precision and a smaller risk for roundoff errors.

Format

The typical format includes a sign bit (additive inverse, positive/negative) an exponent, and a mantissa. TI, for example, uses 1 bit for sign, 8 bits for the exponent, and 7 bytes storing 12 base 10 digits. If the bytes were 0083 31 41 59 26 53 58 98, it would be read as sign=0, exponent=83h-80h=3, mantissa=3.1415926535898, so the value represented is 3.1415926535898*10^^3^^=3141.5926535898.

Generally, the math has a wider range and better precision if we use a base of power of 2. Depending on the usage, we tend to try to pack all of the data to fill 8-bit multiples. TI uses an extra 8 bits to store the 1 sign bit, but in their application, they also use that byte to store variable type. They actually took advantage of unused bits that had to be there anyways.

Sign Bit

The sign bit is just a 0 or 1 that determines if a number is positive or negative.

Exponent

To decide a good range for the exponent, you will need to pick a base. TI used 10, but generally, 2 might be better or 256 so that we use whole bytes as digits. Essentially, we will use something like 2^^exponent^^*mantissa. It is generally easier to make the middle number in your range as 0. If you are using an 8-bit exponent, 80h=0 for the exponent.

Mantissa

The mantissa is best kept "normalized" by making sure we don't have leading zeroes. If we are using a base of 2, then a mantissa of %0011111100000000 is usually not what we want. Instead, we shift it left twice, which means we decrement the exponent by 2.

Examples

If we decided to have 1 bit for sign, 7 bits for the exponent, and 16 bits for the mantissa, using base 2, then the following would represent 1:

   %01000000   ;sign=0, exponent=64 (treated as 0)
   %1000000000000000 ;mantissa=1.0

And likewise, 2 would be just multiplying by 2 which is the same as increasing the exponent by 1:

   %01000001   ;sign=0, exponent=65 (65-64=1)
   %1000000000000000 ;mantissa=1.0
   1.0*2^1=2

So if we think of the exponent as the location of the decimal, 2 is 10.00000000000000 and 1 is 1.000000000000000. If we want to perform some arithmetic...

Addition and Subtraction

This can be a bit difficult because we have to keep track of the sign of the inputs and the result, but we also have to line up the 'decimal points' (the exponent). In the above examples for the 24-bit floats for 1.0 and 2.0, to line up the decimals, we need to rotate the mantissa of 1 to the right (and likewise, increment the exponent:

   %01000001   ;sign=0, exponent=65 (treated as 1)
   %0100000000000000 ;mantissa=0.5
   0.5*2^1=1

Note that 1 is now represented as a denormalized number, but we can directly add the numbers by adding their mantissas:

   %01000001   ;sign=0, exponent=65 (treated as 1)
   %1100000000000000 ;mantissa=1.5
   1.5*2^1=3

If there was overflow, we would have to rotate the result right to get the upper bit of the result back in the number and then increment the exponent. For example, if we added 2 to this, we would get %0100000000000000 as the 16-bit addition, but the overflow (carry flag) would be a 1, so we would rotate it back in, dropping the last bit (in our case, a 0):

   %01000010   ;sign=0, exponent=66 (treated as 2)
   %1010000000000000 ;mantissa=1.25
   1.25*2^2=5

Adding two negative numbers -a+-b is the same as -(a+b) so it is just like adding two positive numbers and you don't need to change the sign. Adding a positive and negative number will require a subtraction of their absolute values. For example, 2+-1 would require rotating the mantissas as in addition, but then subtract %1000...-%0100... to get %0100... which needs to be renormalised. In this case, it is still a positive result. If there was overflow (on the z80, the c flag is set), then our result is negative, but we also need to perform a 2s compliment on the mantissa (that is, negate it as if it were an integer). So in the case of 1-2, we would get: %01000001 1100000000000000. 2s compliment would return $01000001 0100000000000000 and then we set the sign as negative and renormalise the number, getting the expected -1. In the case where the first number is negative and the second is positive, it would be easiest to swap their order for subtraction instead of making a separate reverse-subtract routine.

Multiplication and Division

Multiplication and division are much easier than addition and subtraction. When multiplying or dividing two numbers, you just need to use XOR logic on the signs and perform basically integer math on the mantissas. If both numbers are negative, their sign bits will be 1 and 1 xor 1=0, which is positive. Likewise, 1 xor 0=1, 0 xor 1=1, 0 xor 0=0. To multiply A and B, add their exponents together, to divide, subtract the exponent of B from A (taking care of underflow/overflow, accordingly which is a simple check of the c flag).

Now you have the sign and exponent of the result, but you still need to get the mantissa. For multiplication, just multiply the two mantissas and take the upper bits for the result, rounding as seen fit. For division, it is a slightly different algorithm than integer division (or so it may seem). Since both numbers are normalized, you don't need to rotate bits into an accumulator. Instead, check if B<=A. If it is, perform B-A as an integer subtraction, and set the upper bit of the result, else leave it reset. Then shift B to the right and repeat the process, setting or resetting the next bits at each iteration. For a 64-bit mantissa, this requires 64 tests. As well, for best accuracy, A and B would need additional 64 trailing zero bits making subtraction 128-bits. Since it is integer subtraction, it is still fairly trivial.

Square Root

For square roots, exponents are divided by 2. If the exponent is odd, add 1 to it to make it even and rotate the mantissa right. From there, the mantissa can be treated as an integer to take the square root of.

Modulo

For many trig applications, you will need to take input mod pi. To do this, check the exponent. If it is less than 1, it is smaller than pi, so you are finished. if it is greater than or equal to 1, perform 3-exponent and this will be the number of iterations required for the algorithm about to be described.

If the mantissa is greater than or equal to pi, subtract pi. Then, shift pi to the right (dividing it by 2) and repeat the needed number of times.

For general modulo, just replace 'pi' with any number and check the exponent for being less than the exponent of the modulus.

24-bit Floating Point

24-bit floats are excellent because the math can be contained in the registers. These currently use 7-bit exponents such that 0=0 and %1111111 = 3Fh = -1

RAM equates

The order in which the RAM values are stored are important:

   f24_exp2:
   	.db 0
   f24_exp1:
   	.db 0
   f24_exp4:
   	.db 0
   f24_exp3:
   	.db 0
   f24_exp6:
   	.db 0
   f24_exp5:
   	.db 0
   f24_man1:
   	.dw 0
   f24_man2:
   	.dw 0
   f24_man3:
   	.dw 0
   f24_man4:
   	.dw 0
   f24_man5:
   	.dw 0
   f24_man6:
   	.dw 0
   tempbyte1:
   	.db 0
   tempword1:
   	.dw 0

Addition/Subtraction

   Float24Add:
   ;BHL+CDE
       ld a,c
       xor 40h
       ld c,a
   	rlc c
       ld a,b
       rla
   	rlc b
       xor 80h
   	sub c
   	jr z,f24shifted
   	jr nc,f24shiftDE
       rra
   	cp -16
   	jr nc,$+6
   	ld a,c \ ex de,hl \ rrca \ ret
   	srl h \ rr l
   	inc a \ jr nz,$-5
   	jr nc,$+3 \ inc hl
   	ld a,b \ and 1 \ ld b,a \ ld a,c \ and $FE \ xor 80h \ or b \ jp f24shifted+1
   f24ShiftDE:
   	rra
   	cp 16
   	jr c,$+5
   	ld a,b \ rrca \ ret
   	srl d \ rr e
   	dec a \ jr nz,$-5
   	jr nc,$+3 \ inc de
   	ld a,c \ and 1 \ ld c,a \ ld a,b \ and $FE \ or c \ ld c,a
   f24Shifted:
   	ld a,b \ rrca \ ld b,a \ ld a,c \ rrca \ ld c,a \ xor b \ ld a,b \ jp m,f24sub
   	add hl,de \ ret nc
   	ld c,0 \ inc c \ inc c \ rr h \ rr l \ jr nc,$+8 \ inc l \ jr nz,$+5 \ inc h
   	jr z,$-12
   	rlca \ add a,c \ jp p,$+10
   	jr c,$+7 \ bit 6,b \ rrca \ jr z,$+4
       rrca \ ret
   SetInf:
   	ld hl,$FFFF
   	rla
   	ld a,h
   	rra
   	ld b,a
   	ret
   f24Sub:
   	sbc hl,de
   	jr nc,normalise24
   	xor 80h
       ld b,a
       xor a \ sub l \ ld l,a
       sbc a,a \ sub h \ ld h,a
       ld a,b
   normalise24:
   	bit 7,h \ ret nz
   	ld a,b \ add a,a
   	cp $82 \ ret z
   	dec a \ dec a
   	add hl,hl
   	rlc b \ rra \ ld b,a
   	jp normalise24
   
   Float24Sub:
   ;BHL-CDE
       ld a,80h
       xor c
       ld c,a
       jp Float24Add

Multiplication

   Float24Mul:
   ;BHL*CDE -> AHL
       ld a,b \ xor c
       and 80h
       push af
       sla b \ sra b
       sla c \ sra c
       ld a,b
       add a,c
       pop bc
       jp m,$+10
       cp 64
       jr nc,SetInf-2
       jp $+7
       cp -64
       jr c,SetInf-2
   ;B has the right sign
       and 7Fh
       or b
       push af
       ld b,h
       ld c,l
       call BC_Times_DE
       bit 7,l
       ld l,h
       ld h,b
       jr z,$+3
       inc hl
       pop af
       ret
   BC_Times_DE:
   ;BHLA is the result
       ld a,b
       or a
       ld hl,0
       ld b,h
   ;1
       add a,a
       jr nc,$+4
       ld h,d
       ld l,e
   ;2
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,b
   ;227+10b-7p
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,b
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,b
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,b
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,b
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,b
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,b
   
   ;===
   ;AHL is the result of B*DE*256
       push hl
       ld h,b
       ld l,b
       ld b,a
       ld a,c
       ld c,h
   ;1
       add a,a
       jr nc,$+4
       ld h,d
       ld l,e
   ;2
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,c
   ;227+10b-7p
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,c
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,c
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,c
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,c
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,c
   
       add hl,hl
       rla
       jr nc,$+4
       add hl,de
       adc a,c
   
       pop de
   ;Now BDE*256+AHL
       ld c,a
       ld a,l
       ld l,h
       ld h,c
       add hl,de
       ret nc
       inc b
   ;BHLA is the 32-bit result
       ret

Division

Note that this routine currently does not use 32-bit subtractions, so it can get bad accuracy.

   Float24Div:
   ;BHL/CDE -> AHL
       ld a,b \ xor c
       and 80h
       push af
       sla b \ sra b
       sla c \ sra c
       ld a,b
       sub c
       pop bc
       jp m,$+11
       cp 64
       jp nc,SetInf-2
       jp $+7
       cp -64
       jp c,SetInf-2
   ;B has the right sign
       and 7Fh
       or b
       push af
   
   
       ld bc,0
       or a \ sbc hl,de \ jr c,$+7 \ set 7,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 6,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 5,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 4,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 3,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 2,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 1,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 0,b \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 7,c \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 6,c \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 5,c \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 4,c \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 3,c \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 2,c \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+7 \ set 1,c \ jp $+4 \ add hl,de \ srl d \ rr e
       or a \ sbc hl,de \ jr c,$+4 \ set 0,c
       ld h,b \ ld l,c
       pop bc
       jp normalise24

log base 2

   Float24Lg:
   ;Inputs: AHL is the 24-bit float
   ;     bit 7,a is the sign
   ;     bits 0~6 are the signed exponent
   ;     HL is the mantissa
   ;Output:
   ;    nc flag set if result would be complex
   ;    c if real result
   ;    BHL is the computed result if real
       or a
       jp p,$+9
       ld a,-1
       ld hl,0
       ret
       add a,a
       sra a
   ;A is now the value that must be added to the final result
       push af
       exx \ ld hl,0 \ exx
   FloatLogLoop:
   ;Step 1 : check if HL >=AAAA
   step1:
       ld bc,$AAAA
       xor a
       sbc hl,bc
       add hl,bc
       jp c,step2
       exx
       ld bc,27200
       add hl,bc
       exx
   ;multiply HL by 3/4
       ld d,h
       ld e,l
       add hl,hl \ rla
       add hl,de \ adc a,0
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       jr nc,step1+3
       inc hl
       jp step1+3
   step2:
       ld bc,$9249
       xor a
       sbc hl,bc
       add hl,bc
       jp c,step3
       exx
       ld bc,12625
       add hl,bc
       exx
   ;multiply HL by 7/8
       ld d,h
       ld e,l
       add hl,hl \ rla
       add hl,hl \ rla
       add hl,hl \ rla
       sbc hl,de \ sbc a,0
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       jr nc,step2+3
       inc hl
       jp step2+3
   step3:
       ld bc,$8888
       xor a
       sbc hl,bc
       add hl,bc
       jp c,step4
       exx
       ld bc,6102
       add hl,bc
       exx
   ;multiply HL by 7/8
       ld d,h
       ld e,l
       add hl,hl \ rla
       add hl,hl \ rla
       add hl,hl \ rla
       add hl,hl \ rla
       sbc hl,de \ sbc a,0
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       jr nc,step3+3
       inc hl
       jp step3+3
   step4:
       ld bc,$8421
       xor a
       sbc hl,bc
       add hl,bc
       jp c,step5
       exx
       ld bc,3002
       add hl,bc
       exx
   ;multiply HL by 31/32
   ;multiply HL by 31
       push hl
       srl h \ rr l \ rra
       srl h \ rr l \ rra
       srl h \ rr l \ rra
       ld d,a
       ld a,h \ ld h,l \ ld l,d
       pop de
       sbc hl,de \ sbc a,0
   ;now divide by 32
   ; 7 shifts right
       add hl,hl \ rla
       add hl,hl \ rla
       add hl,hl \ rla
       bit 7,l
       ld l,h
       ld h,a
       jr z,step4+3
       inc hl
       jp step4+3
   
   step5:
       ld bc,$8208
       xor a
       sbc hl,bc
       add hl,bc
       jp c,step6
       exx
       ld bc,1489
       add hl,bc
       exx
   ;multiply HL by 63/64
   
   ;multiply HL by 63
       push hl
       srl h \ rr l \ rra
       srl h \ rr l \ rra
       ld d,a
       ld a,h \ ld h,l \ ld l,d
       pop de
       sbc hl,de \ sbc a,0
   ;now divide by 64
   ; 7 shifts right
       add hl,hl \ rla
       add hl,hl \ rla
       bit 7,l
       ld l,h
       ld h,a
       jr z,step5+3
       inc hl
       jp step5+3
   
   step6:
       ld bc,$8102
       xor a
       sbc hl,bc
       add hl,bc
       jp c,step7
       exx
       ld bc,742
       add hl,bc
       exx
   ;multiply HL by 127
       ld d,h
       ld e,l
       srl h \ rr l
       ld a,h \ ld h,l \ ld l,0 \ rr l
       sbc hl,de \ sbc a,0
   ;now divide by 128
   ; 7 shifts right
       add hl,hl \ rla
       bit 7,l
       ld l,h
       ld h,a
       jr z,step6+3
       inc hl
       jp step6+3
   
   
   step7:
       ld bc,$8080
       xor a
       sbc hl,bc
       add hl,bc
       jp c,step8
       exx
       ld bc,370
       add hl,bc
       exx
   ;multiply HL by 255
       ld d,h
       ld e,l
       ld a,h \ ld h,l \ ld l,0
       sbc hl,de \ sbc a,0
   ;now divide by 256
       bit 7,l
       ld l,h
       ld h,a
       jr z,step7+3
       inc hl
       jp step7+3
   
   
   step8:
       exx \ ld bc,185 \ exx
       rrc c
       sbc hl,bc
       add hl,bc
       jp c,step9
       exx
       add hl,bc
       exx
   ;multiply HL by 511/512
       xor a
       ld e,h
       ld d,a
       cp L
       jr z,$+3
       inc de
       add hl,hl \ rla
       sbc hl,de \ sbc a,0
       rra \ rr h \ rr l
       jr nc,step8+7
       inc hl
       jp step8+7
   
   step9:
       rrc c
       sbc hl,bc
       add hl,bc
       jp c,step10
       exx
       ld c,92
       add hl,bc
       exx
   ;multiply HL by 1023/1024
       xor a
       ld e,h
       ld d,a
       cp L
       jr z,$+3
       inc de
       add hl,hl \ rla
       add hl,hl \ rla
       sbc hl,de \ sbc a,0
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       jr nc,step9+2
       inc hl
       jp step9+2
   
   step10:
       rrc c
       sbc hl,bc
       add hl,bc
       jp c,step11
       exx
       ld c,46
       add hl,bc
       exx
   ;multiply HL by 1023/1024
       xor a
       ld e,h
       ld d,a
       cp L
       jr z,$+3
       inc de
       add hl,hl \ rla
       add hl,hl \ rla
       add hl,hl \ rla
       sbc hl,de \ sbc a,0
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       jr nc,step10+2
       inc hl
       jp step10+2
   
   step11:
       rrc c
       sbc hl,bc
       add hl,bc
       jp c,step12
       exx
       ld c,23
       add hl,bc
       exx
   ;multiply HL by 2047/2048
       xor a
       ld e,h
       ld d,a
       cp L
       jr z,$+3
       inc de
       add hl,hl \ rla
       add hl,hl \ rla
       add hl,hl \ rla
       add hl,hl \ rla
       sbc hl,de \ sbc a,0
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       rra \ rr h \ rr l
       jr nc,step11+2
       inc hl
       jp step11+2
   
   step12:
   ;    rrc c
   ;    sbc hl,bc
   ;    add hl,bc
   ;    jp c,step13
   ;    exx
   ;    ld c,12
   ;    add hl,bc
   ;    exx
   
   step13:
   logfin:
   
       pop af
       exx
       ld bc,$7F07
       ld d,a
       ld a,b
       ret z
       ld a,d
       ld e,0
   
       jp p,$+7
       ld c,$87
       neg
       dec c
       add a,a \ jp p,$-2
       ld d,a
       jp Float24Add

Square Root

   Float24Sqrt:
   ;Input:
   ;  AHL is the 24-bit float of which to find the square root
   ;Output:
   ;  BHL is the resulting 24-bit square root
       or a
       jp p,$+9
       ld a,-1
       ld hl,0
       ret
   
       rra
       jr c,$+6
       srl h
       rr l
       add a,a
       add a,a
       sra a
       rra
       ld b,a
   SqrtHL_prec16:
   ;input: HL
   ;Output: DE
   ;247 bytes
   ;1269 t-states worst case
   ;Average is about 1219 t-states
       xor a
       ld d,a
   
       ld c,l
       ld l,h
       ld h,a
   
       add hl,hl
       add hl,hl
       cp h
       jr nc,$+5
       dec h
       ld a,4
   
       add hl,hl
       add hl,hl
       ld e,a
       sub h
       jr nc,$+6
       cpl
       ld h,a
       inc e
       inc e
   
       ld a,e
       add hl,hl
       add hl,hl
       add a,a
       ld e,a
       sub h
       jr nc,$+6
       cpl
       ld h,a
       inc e
       inc e
   
       ld a,e
       add hl,hl
       add hl,hl
       add a,a
       ld e,a
       sub h
       jr nc,$+6
       cpl
       ld h,a
       inc e
       inc e
   
       ld a,e
       ld l,c
   
       add hl,hl
       add hl,hl
       add a,a
       ld e,a
       sub h
       jr nc,$+6
       cpl
       ld h,a
       inc e
       inc e
   
       ld a,e
       add hl,hl
       add hl,hl
       add a,a
       ld e,a
       sub h
       jr nc,$+6
       cpl
       ld h,a
       inc e
       inc e
   
       ld a,e
       add a,a \ ld e,a
       add hl,hl
       add hl,hl
       jr nc,$+6
       sub h \ jp $+6
       sub h
       jr nc,$+6
       inc e \ inc e
       cpl
       ld h,a
   
   
       ld a,l
       ld l,h
       add a,a
       ld h,a
       adc hl,hl
       adc hl,hl
       sll e \ rl d
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
   ;iteration 9
       add hl,hl \ add hl,hl
       sll e \ rl d
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
       add hl,hl \ add hl,hl
       sll e \ rl d
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
       add hl,hl \ add hl,hl
       sll e \ rl d
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
       add hl,hl \ add hl,hl
       sll e \ rl d
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
       add hl,hl \ add hl,hl
       sll e \ rl d
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
       add hl,hl \ add hl,hl
       sll e \ rl d
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
       sll e \ rl d
       add hl,hl \ add hl,hl    ;up to 17 bits
       jr nc,$+8
           sbc hl,de
           inc e
           jp $+13
       sbc hl,de
       jr nc,$+3
       add hl,de
       sbc a,a \ add a,a \ inc a \ add a,e \ ld e,a
   
       srl d \ rr e \ scf
       sbc hl,de
       ccf
       ex de,hl
       adc hl,hl
       ret

Float To Signed Int

   FloatToSInt:
   ;Input: BHL
   ;Output : DE
   ;Overflow is handled as 0x7FFF or 0xFFFF
       ld de,0
       ld a,b
       and 80h
       or d
       ld d,a
       ld a,b
       add a,a
       ret m
       dec de
       rr d \ rr e
       sub 16
       ret nc
       cpl
       ld d,h \ ld e,l
       ld b,a
       sra d \ rr e
       djnz $-4
       ret

Float to Unsigned Int

   FloatToUInt:
   ;Input: BHL
   ;Output : DE
   ;Overflow is handled as 0xFFFF
       ld de,0
       ld a,b
       add a,a
       ret m
       dec de
       sub 16
       ret nc
       cpl
       ld d,h \ ld e,l
       ld b,a
       srl d \ rr e
       djnz $-4
       ret

Atan2 (CORDIC)

This is getting questionable accuracy, but it is a CORDIC implementation of arctangent using an x and y input:

   Float24Atan2:
   ;Inputs:
   ; BHL=X
   ; CDE=Y
   ;Outputs:
   ; AHL is the angle (64 corresponds to 90 degrees)
   ; CDE can be multiplied by 0.60725293500888 (7F 9B75)to get sqrt(x^2+y^2).
       ld (f24_exp2),bc
       ld (f24_man1),hl
       ld (f24_man2),de
       xor a
       ld h,a
       ld l,a
       ld (f24_man3),hl
       ld (f24_exp3),a
       ld (f24_man4),hl
       ld (f24_exp4),a
       ld hl,Float24_arctantable
       ld (tempword1),hl
   f24_atan2loop:
       ld (tempbyte1),a
       add a,c
       and $7F
       ld c,a
   ;add CDE to f1, store to f5
       ld hl,(f24_man1)
       call Float24Add
       ld (f24_exp5),a
       ld (f24_man5),hl
   
       ld bc,(f24_exp2)
       ld a,(tempbyte1)
       add a,b
       and $7F
       bit 7,b
       jr z,$+4
       or 80h
       ld d,a
       ld b,c
       ld c,d
       ld de,(f24_man1)
       ld hl,(f24_man2)
   
       bit 7,b
       jr nz,$+2+3+3+3+3+1+3
       call Float24Sub
       ld (f24_man2),hl
       ld (f24_exp2),a
       ld hl,(tempword1)
       ld c,(hl)
       jp adjustangle
   
       call Float24Add
       ld (f24_man2),hl
       ld (f24_exp2),a
       ld a,(tempbyte1)
       ld hl,(tempword1)
       ld c,(hl) \ set 7,c
   AdjustAngle:
       inc hl
       ld e,(hl) \ inc hl
       ld d,(hl) \ inc hl
       ld (tempword1),hl
       ld hl,(f24_man4)
       ld a,(f24_exp4)
       ld b,a
       call Float24Add
       ld (f24_man4),hl
       ld (f24_exp4),a
   
       ld hl,(f24_man5)
       ld a,(f24_exp5)
       ld (f24_man1),hl
       ld (f24_exp1),a
   
       ld bc,(f24_exp2)
       ld de,(f24_man2)
       ld a,(tempbyte1)
       dec a
       cp -8
       jp nz,f24_atan2loop
       ld hl,(f24_man4)
       ld a,(f24_exp4)
       ld bc,(f24_exp1)
       ld de,(f24_man1)
       ret
   
   Float24_arctantable:
    .db 5   \ .dw $8000
    .db 4   \ .dw $9720
    .db 3   \ .dw $9FB4
    .db 2   \ .dw $A222
    .db 1   \ .dw $A2C3
    .db 0   \ .dw $A2EC
    .db -1  \ .dw $A2F6
    .db -2  \ .dw $A2F9    ;note that from here, these are essentially the same
    .db -3  \ .dw $A2F9
    .db -4  \ .dw $A2F9
    .db -5  \ .dw $A2FA
    .db -6  \ .dw $A2FA
    .db -7  \ .dw $A2FA
    .db -8  \ .dw $A2FA
    .db -9  \ .dw $A2FA
    .db -10 \ .dw $A2FA

32-bit Floating Point

80-bit Floating Point

Conclusion

For more routines, until this page can be filled in more, see this Floating Point Library

z80:Floating Point

Contents

Floating Point Tutorial

Format

Sign Bit

Exponent

Mantissa

Examples

Addition and Subtraction

Multiplication and Division

Square Root

Modulo

24-bit Floating Point

RAM equates

Addition/Subtraction

Multiplication

Division

log base 2

Square Root

Float To Signed Int

Float to Unsigned Int

Atan2 (CORDIC)

32-bit Floating Point

80-bit Floating Point

Conclusion

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools