PDA

View Full Version : Zone.exe VTune performance analysis results (fairly long and


Kaiyodo
03-29-2002, 10:46 AM
I thought I'd do a quick VTune run on Zone.exe as a couple of people have mentioned that they were having performance problems with it. Mainly I wanted to test that none of my DW/DA code was taking too much time, and to see if Image was correct in his fear of floats :)

This test was conducted as a level 60 halfling warrior in surefall glade containing about 25 NPCs (From Drawde's excellent db). I ran around for 10 mins killing every mob in the zone, pretty much constant fighting with multiple mobs at a time (and a lot of complete heals).

System was an Athlon XP 1800+ with 512 meg RAM running Windows XP. Test was repeated 3 times, results were pretty consistent so I'm just posting the last one rather than work out the averages :)

I was running Minilog, World.exe, a single zone.exe and EQ on the same machine.


VTUNE Results
-------------

10 min test, 60 sec starting pause

Total system usage: (Edited out a load of system crap, these are the highest peaks)

nv4_disp.dll 33.6% <-- Graphics card driver
EQGfx_Dx8.dll 25.0%
dvps.dll 17.2% <-- this is 'Umbra'
d3d8.dll 7.4%
...
Zone.exe 0.2%

Zone.exe function usage:

Timer::Check 16.7%
NPC::Process 5.8%
EntityList::Process 5.2%
Spawn2::Process 5.1%
Mob::FindSpell 3.5%
EntityList::Process 2.6%
Mob::SpellProcess 2.2%
CEQPacket::CRCLookup 1.8%
strstr 1.7%
Zone::Process 1.4%
HateList::GetTop 1.4%
Corpse::Process 1.0%
Database::GetItem 0.8%
EntityList::Process 0.7%
pow 0.7%


I didn't spend too long doing this test, and VTune was being it's usual awkward self and not showing me the source for most of the peaks, but Timer::Check was the clear winner :)

I examined the assembler for it and found this was the culprit ..

MOV al, BYTE PTR [ecx+08h] - stalls for 247 clock ticks

That's the check to see if the timer is enabled or not. It comfirms what I've assumed for a while now, memory access latency is THE performance killer on modern PC's. I should have tried changing the 'enabled' flag from an int8 to a 4-byte type and tried it again, but I'm tired now :) IIRC, 32-bit x86 processors aren't keen on 8 bit variables.

Interestingly rand() didn't show up at all, or _ftol (FPU conversion of a float to a long) which is a common performance killer.

I've rambled on enough anyway, hopefully someone finds this useful or at least slightly interesting :)

K.

DeletedUser
04-04-2002, 09:29 AM
I find it very interesting and good information about what needs to be perfected

DeletedUser
04-24-2002, 08:48 AM
herm, CEQPacket::CRCLookup is pretty high too, and i think can be optimized better.

I think the crc there is just the standard crc32. There's sources out there that are much more optimized to do crc32, should check if switching to one of them would help.