Kaiyodo
03-29-2002, 10:46 AM
I thought I'd do a quick VTune run on Zone.exe as a couple of people have mentioned that they were having performance problems with it. Mainly I wanted to test that none of my DW/DA code was taking too much time, and to see if Image was correct in his fear of floats :)
This test was conducted as a level 60 halfling warrior in surefall glade containing about 25 NPCs (From Drawde's excellent db). I ran around for 10 mins killing every mob in the zone, pretty much constant fighting with multiple mobs at a time (and a lot of complete heals).
System was an Athlon XP 1800+ with 512 meg RAM running Windows XP. Test was repeated 3 times, results were pretty consistent so I'm just posting the last one rather than work out the averages :)
I was running Minilog, World.exe, a single zone.exe and EQ on the same machine.
VTUNE Results
-------------
10 min test, 60 sec starting pause
Total system usage: (Edited out a load of system crap, these are the highest peaks)
nv4_disp.dll 33.6% <-- Graphics card driver
EQGfx_Dx8.dll 25.0%
dvps.dll 17.2% <-- this is 'Umbra'
d3d8.dll 7.4%
...
Zone.exe 0.2%
Zone.exe function usage:
Timer::Check 16.7%
NPC::Process 5.8%
EntityList::Process 5.2%
Spawn2::Process 5.1%
Mob::FindSpell 3.5%
EntityList::Process 2.6%
Mob::SpellProcess 2.2%
CEQPacket::CRCLookup 1.8%
strstr 1.7%
Zone::Process 1.4%
HateList::GetTop 1.4%
Corpse::Process 1.0%
Database::GetItem 0.8%
EntityList::Process 0.7%
pow 0.7%
I didn't spend too long doing this test, and VTune was being it's usual awkward self and not showing me the source for most of the peaks, but Timer::Check was the clear winner :)
I examined the assembler for it and found this was the culprit ..
MOV al, BYTE PTR [ecx+08h] - stalls for 247 clock ticks
That's the check to see if the timer is enabled or not. It comfirms what I've assumed for a while now, memory access latency is THE performance killer on modern PC's. I should have tried changing the 'enabled' flag from an int8 to a 4-byte type and tried it again, but I'm tired now :) IIRC, 32-bit x86 processors aren't keen on 8 bit variables.
Interestingly rand() didn't show up at all, or _ftol (FPU conversion of a float to a long) which is a common performance killer.
I've rambled on enough anyway, hopefully someone finds this useful or at least slightly interesting :)
K.
This test was conducted as a level 60 halfling warrior in surefall glade containing about 25 NPCs (From Drawde's excellent db). I ran around for 10 mins killing every mob in the zone, pretty much constant fighting with multiple mobs at a time (and a lot of complete heals).
System was an Athlon XP 1800+ with 512 meg RAM running Windows XP. Test was repeated 3 times, results were pretty consistent so I'm just posting the last one rather than work out the averages :)
I was running Minilog, World.exe, a single zone.exe and EQ on the same machine.
VTUNE Results
-------------
10 min test, 60 sec starting pause
Total system usage: (Edited out a load of system crap, these are the highest peaks)
nv4_disp.dll 33.6% <-- Graphics card driver
EQGfx_Dx8.dll 25.0%
dvps.dll 17.2% <-- this is 'Umbra'
d3d8.dll 7.4%
...
Zone.exe 0.2%
Zone.exe function usage:
Timer::Check 16.7%
NPC::Process 5.8%
EntityList::Process 5.2%
Spawn2::Process 5.1%
Mob::FindSpell 3.5%
EntityList::Process 2.6%
Mob::SpellProcess 2.2%
CEQPacket::CRCLookup 1.8%
strstr 1.7%
Zone::Process 1.4%
HateList::GetTop 1.4%
Corpse::Process 1.0%
Database::GetItem 0.8%
EntityList::Process 0.7%
pow 0.7%
I didn't spend too long doing this test, and VTune was being it's usual awkward self and not showing me the source for most of the peaks, but Timer::Check was the clear winner :)
I examined the assembler for it and found this was the culprit ..
MOV al, BYTE PTR [ecx+08h] - stalls for 247 clock ticks
That's the check to see if the timer is enabled or not. It comfirms what I've assumed for a while now, memory access latency is THE performance killer on modern PC's. I should have tried changing the 'enabled' flag from an int8 to a 4-byte type and tried it again, but I'm tired now :) IIRC, 32-bit x86 processors aren't keen on 8 bit variables.
Interestingly rand() didn't show up at all, or _ftol (FPU conversion of a float to a long) which is a common performance killer.
I've rambled on enough anyway, hopefully someone finds this useful or at least slightly interesting :)
K.