Difference between revisions of "Z80:Optimization"
KermMartian (talk | contribs) (Adjusted lowercase z80) |
KermMartian (talk | contribs) |
||
Line 5: | Line 5: | ||
== xor a vs. ld a,0 == | == xor a vs. ld a,0 == | ||
− | A simple way to set a to zero. Don't use this if you want to preserve the | + | A simple way to set a to zero. Don't use this if you want to preserve the accumulator. |
== or a vs. cp 0 == | == or a vs. cp 0 == | ||
It's smaller, and give you the same results. | It's smaller, and give you the same results. | ||
+ | |||
+ | == dec a vs. cp 1 == | ||
+ | |||
+ | If you can, ''dec a'' is a smaller and faster way to check if ''a'' or any other register is 1. 8-bit increments and decrements will effect both the z flag and sign flag, among other things. | ||
+ | |||
+ | == inc a vs. cp 255 == | ||
+ | |||
+ | Again, this is a byte smaller and 3 t-states faster if you use inc a. It does not preserve a, but you can often do this and it works on all of the main 8-bit registers and (hl). | ||
+ | |||
+ | == adc a,0 vs. jr nc,$+3 \ inc a == | ||
+ | |||
+ | adc a,0 is 7 t-states and 2 bytes, whereas the latter is 3 bytes and 11 t-states if the c flag is set, 12 if it is reset. Save a byte and 4 to 5 cycles ! | ||
+ | |||
+ | == ccf \ adc a,0 vs. jr c,$+3 \ inc a == | ||
+ | |||
+ | They are the same size, but the former is always 11 t-states whereas the latter is either 11 or 12 depending on the c flag. | ||
+ | |||
+ | == sbc a,0 vs. jr nc,$+3 \ dec a == | ||
+ | |||
+ | See adc a,0 vs. jr nc,$+3 \ inc a. | ||
+ | |||
+ | == ccf \ sbc a,0 vs. jr c,$+3 \ dec a == | ||
+ | |||
+ | See ccf \ adc a,0 vs. jr c,$+3 \ inc a. | ||
+ | |||
+ | == scf \ ccf == | ||
+ | |||
+ | This is used to reset the c flag, but there are many other ways to do that. This is 8 t-states, 2 bytes, but the following are 1 byte, 4 t-states: | ||
+ | |||
+ | or a ;z flag is set if a = 0 | ||
+ | and a ;z flag is set if a=0 | ||
+ | xor a ;always sets the z flag, sets A=0 | ||
+ | cp a ;always sets the z flag. | ||
+ | sub a ;always sets the z flag, sets A=0 | ||
+ | |||
+ | As well, the following are two bytes, but 7 t-states. You should not use these : | ||
+ | |||
+ | sub 0 | ||
+ | add a,0 | ||
+ | cp 0 | ||
+ | |||
+ | In each of these cases, other flags are also modified. | ||
== Cursor/pen == | == Cursor/pen == | ||
Line 103: | Line 145: | ||
.db 13,"HIGH SCORE!",0 | .db 13,"HIGH SCORE!",0 | ||
+ | |||
+ | |||
+ | = Optimised Code Snippets = | ||
+ | == Test For 0 (8-bits) == | ||
+ | For any 8-bit register, you can use the following: | ||
+ | |||
+ | inc [reg8] | ||
+ | dec [reg8] | ||
+ | |||
+ | This will set the z flag if the register is 0, else nz. It is 8 t-states, 2 bytes, and preserves registers. | ||
+ | |||
+ | == Set A=0 == | ||
+ | ''ld a,0'' is 2 bytes, 7 t-states, the following are 1-byte and 4 t-states: | ||
+ | |||
+ | xor a | ||
+ | sub a | ||
+ | |||
+ | Note that these will change flags, but usually that is okay. | ||
+ | |||
+ | == 16-bit CP == | ||
+ | |||
+ | To compare HL to another 16-bit register, you can do the following: | ||
+ | |||
+ | or a | ||
+ | sbc hl,[reg16] | ||
+ | add hl,[reg16] | ||
+ | |||
+ | The ''or a'' is simply to reset the c flag, so if the c flag is reset at this point, don't include that and save a byte plus 4 t-states. The speed here is 4+15+11 = 30 t-states and it is 4 bytes total. | ||
+ | |||
+ | == Conditionally Set or Reset A == | ||
+ | In some cases, you need to set all of the bits in A or reset all of them based on a flag. If you are using the c flag: | ||
+ | |||
+ | sbc a,a | ||
+ | |||
+ | 1 byte, 4 t-states is all it takes. It also preserves the c flag, so if the c flag was set, it sets A=255, else A=0 and the c flag stays the same. | ||
+ | |||
+ | == 16-bit NEG == | ||
+ | |||
+ | To get the negative (additive inverse) of a 16-bit register, the following 6 byte, 24 t-state routine can be used: | ||
+ | |||
+ | xor a | ||
+ | sub [LSBreg16] | ||
+ | ld [LSBreg16],a | ||
+ | sbc a,a | ||
+ | sub [MSBreg16] | ||
+ | ld [MSBreg16],a | ||
+ | |||
+ | An example code would be: | ||
+ | |||
+ | xor a | ||
+ | sub l | ||
+ | ld l,a | ||
+ | sbc a,a | ||
+ | sub h | ||
+ | ld h,a | ||
+ | |||
+ | |||
+ | == ld hl,(hl) == | ||
+ | Often we want to use indirection when using a lookup table of addresses. For example, say you have a look-up table for strings: | ||
+ | |||
+ | LUT: | ||
+ | .dw String1 | ||
+ | .dw String2 | ||
+ | .dw String3 | ||
+ | .dw String4 | ||
+ | |||
+ | String1: .db "String1",0 | ||
+ | String2: .db "String2",0 | ||
+ | String3: .db "String3",0 | ||
+ | String4: .db "String4",0 | ||
+ | |||
+ | And say you wanted to store the location of the string in HL. Assuming HL already points to the address located in the LUT: | ||
+ | |||
+ | ld e,(hl) | ||
+ | inc hl | ||
+ | ld d,(hl) | ||
+ | ex de,hl | ||
+ | |||
+ | That is 4 bytes, 24 t-states, but it destroys DE. The following is the same size and speed, destroying A: | ||
+ | |||
+ | ld a,(hl) | ||
+ | inc hl | ||
+ | ld h,(hl) | ||
+ | ld l,a | ||
+ | |||
+ | In the case that you need extreme speed or size optimisations, the following also does the trick, but has a few drawbacks: | ||
+ | |||
+ | ld sp,hl | ||
+ | pop hl | ||
+ | |||
+ | At just 2 bytes, 16-tstates that is pretty optimised, but it destroys the stack pointer which is a crucial element to most routines. In general, you would need to save the stack pointer somewhere and later restore it at a total cost of 40 t-states and 8 bytes and your routine wouldn't be able to use the stack. You would then need to use this version of indirection at least 6 times to get a speed saving and 5 times for a size saving, at the cost of 2 bytes of RAM. | ||
= Conclusion = | = Conclusion = | ||
− | From this point on, you may be perfectly happy with your program. It works, runs at a decent speed and is also smaller than it use to be. What more could there be to do? [[Z80:Polishing|Read on]] to find out what else you need to do before you decide to release your program to the general public. | + | From this point on, you may be perfectly happy with your program. It works, runs at a decent speed and is also smaller than it use to be. What more could there be to do? [[Z80:Polishing-it-up|Read on]] to find out what else you need to do before you decide to release your program to the general public. |
{{lowercase}} | {{lowercase}} | ||
[[Category:Z80 Assembly]] | [[Category:Z80 Assembly]] | ||
[[Category:Z80 Heaven]] | [[Category:Z80 Heaven]] |
Revision as of 06:32, 5 February 2016
After you've worked the bugs out, you may if you wish make your program smaller and run faster. This section is dedicated to just that purpose. Although there are a lot of things you can do, here are some general things that can help:
Code replacements
xor a vs. ld a,0
A simple way to set a to zero. Don't use this if you want to preserve the accumulator.
or a vs. cp 0
It's smaller, and give you the same results.
dec a vs. cp 1
If you can, dec a is a smaller and faster way to check if a or any other register is 1. 8-bit increments and decrements will effect both the z flag and sign flag, among other things.
inc a vs. cp 255
Again, this is a byte smaller and 3 t-states faster if you use inc a. It does not preserve a, but you can often do this and it works on all of the main 8-bit registers and (hl).
adc a,0 vs. jr nc,$+3 \ inc a
adc a,0 is 7 t-states and 2 bytes, whereas the latter is 3 bytes and 11 t-states if the c flag is set, 12 if it is reset. Save a byte and 4 to 5 cycles !
ccf \ adc a,0 vs. jr c,$+3 \ inc a
They are the same size, but the former is always 11 t-states whereas the latter is either 11 or 12 depending on the c flag.
sbc a,0 vs. jr nc,$+3 \ dec a
See adc a,0 vs. jr nc,$+3 \ inc a.
ccf \ sbc a,0 vs. jr c,$+3 \ dec a
See ccf \ adc a,0 vs. jr c,$+3 \ inc a.
scf \ ccf
This is used to reset the c flag, but there are many other ways to do that. This is 8 t-states, 2 bytes, but the following are 1 byte, 4 t-states:
or a ;z flag is set if a = 0 and a ;z flag is set if a=0 xor a ;always sets the z flag, sets A=0 cp a ;always sets the z flag. sub a ;always sets the z flag, sets A=0
As well, the following are two bytes, but 7 t-states. You should not use these :
sub 0 add a,0 cp 0
In each of these cases, other flags are also modified.
Cursor/pen
ld hl,$0100 ;$01 is the row, and $00 is the column ld (curRow),hl ld (penCol),hl
This is much more efficient if you're going to change both cursor/pen positions. Because curCol is right after curRow (and penRow is right after penCol), you can use a 16-bit register to load both at once.
PutS
Something you may or may not know, it is that PutS and any other variation modifies HL to point to the byte after the null-term. This is very useful, especially when displaying multiple items to different locations on the screen without having to load string after string into hl.
ld hl,txtTest bcall(_PutS) ld de,$0100 ld (curRow),de ld hl,txtTest2 bcall(_PutS) ;... txtTest: .db "Test",0 txtTest2: .db "Test2",0
can be
ld hl,txtTest bcall(_PutS) ld de,$0100 ld (curRow),de ;we don't need "ld hl,txtTest2", because hl already points to txtTest2 bcall(_PutS) ;... txtTest: .db "Test",0 ;txtTest2 ;Optional, doesn't affect speed or size here .db "Test2",0
It also allows you to display strings through a loop say, for a high score board.
high: ld b,8 ld de,0 ld (curRow),de ld hl,txtHigh highloop: push hl push de ld a,(hl) ld h,0 ld l,a bcall(_DispHL) pop de pop hl inc hl bcall(_PutS) inc e ld d,0 ld (curRow),de djnz highloop bcall(_GetKey) ret txtHigh: .db 20,"HIGH SCORE!",0 txt2nd: .db 19,"HIGH SCORE!",0 txt3rd: .db 18,"HIGH SCORE!",0 txt4th: .db 17,"HIGH SCORE!",0 txt5th: .db 16,"HIGH SCORE!",0 txt6th: .db 15,"HIGH SCORE!",0 txt7th: .db 14,"HIGH SCORE!",0 txt8th: .db 13,"HIGH SCORE!",0
Optimised Code Snippets
Test For 0 (8-bits)
For any 8-bit register, you can use the following:
inc [reg8] dec [reg8]
This will set the z flag if the register is 0, else nz. It is 8 t-states, 2 bytes, and preserves registers.
Set A=0
ld a,0 is 2 bytes, 7 t-states, the following are 1-byte and 4 t-states:
xor a sub a
Note that these will change flags, but usually that is okay.
16-bit CP
To compare HL to another 16-bit register, you can do the following:
or a sbc hl,[reg16] add hl,[reg16]
The or a is simply to reset the c flag, so if the c flag is reset at this point, don't include that and save a byte plus 4 t-states. The speed here is 4+15+11 = 30 t-states and it is 4 bytes total.
Conditionally Set or Reset A
In some cases, you need to set all of the bits in A or reset all of them based on a flag. If you are using the c flag:
sbc a,a
1 byte, 4 t-states is all it takes. It also preserves the c flag, so if the c flag was set, it sets A=255, else A=0 and the c flag stays the same.
16-bit NEG
To get the negative (additive inverse) of a 16-bit register, the following 6 byte, 24 t-state routine can be used:
xor a sub [LSBreg16] ld [LSBreg16],a sbc a,a sub [MSBreg16] ld [MSBreg16],a
An example code would be:
xor a sub l ld l,a sbc a,a sub h ld h,a
ld hl,(hl)
Often we want to use indirection when using a lookup table of addresses. For example, say you have a look-up table for strings:
LUT: .dw String1 .dw String2 .dw String3 .dw String4 String1: .db "String1",0 String2: .db "String2",0 String3: .db "String3",0 String4: .db "String4",0
And say you wanted to store the location of the string in HL. Assuming HL already points to the address located in the LUT:
ld e,(hl) inc hl ld d,(hl) ex de,hl
That is 4 bytes, 24 t-states, but it destroys DE. The following is the same size and speed, destroying A:
ld a,(hl) inc hl ld h,(hl) ld l,a
In the case that you need extreme speed or size optimisations, the following also does the trick, but has a few drawbacks:
ld sp,hl pop hl
At just 2 bytes, 16-tstates that is pretty optimised, but it destroys the stack pointer which is a crucial element to most routines. In general, you would need to save the stack pointer somewhere and later restore it at a total cost of 40 t-states and 8 bytes and your routine wouldn't be able to use the stack. You would then need to use this version of indirection at least 6 times to get a speed saving and 5 times for a size saving, at the cost of 2 bytes of RAM.
Conclusion
From this point on, you may be perfectly happy with your program. It works, runs at a decent speed and is also smaller than it use to be. What more could there be to do? Read on to find out what else you need to do before you decide to release your program to the general public.