View Single Post
  #1  
Old 03-29-2002, 10:46 AM
Kaiyodo
Hill Giant
 
Join Date: Jan 2002
Location: Midlands,UK
Posts: 149
Default Zone.exe VTune performance analysis results (fairly long and

I thought I'd do a quick VTune run on Zone.exe as a couple of people have mentioned that they were having performance problems with it. Mainly I wanted to test that none of my DW/DA code was taking too much time, and to see if Image was correct in his fear of floats

This test was conducted as a level 60 halfling warrior in surefall glade containing about 25 NPCs (From Drawde's excellent db). I ran around for 10 mins killing every mob in the zone, pretty much constant fighting with multiple mobs at a time (and a lot of complete heals).

System was an Athlon XP 1800+ with 512 meg RAM running Windows XP. Test was repeated 3 times, results were pretty consistent so I'm just posting the last one rather than work out the averages

I was running Minilog, World.exe, a single zone.exe and EQ on the same machine.

Code:
VTUNE Results
-------------

10 min test, 60 sec starting pause

Total system usage: (Edited out a load of system crap, these are the highest peaks)

nv4_disp.dll   33.6%   <-- Graphics card driver
EQGfx_Dx8.dll  25.0%
dvps.dll       17.2%   <-- this is 'Umbra'
d3d8.dll       7.4%
...
Zone.exe       0.2%

Zone.exe function usage:

Timer::Check		16.7%
NPC::Process		5.8%
EntityList::Process	5.2%
Spawn2::Process		5.1%
Mob::FindSpell		3.5%
EntityList::Process	2.6%
Mob::SpellProcess	2.2%
CEQPacket::CRCLookup	1.8%
strstr			1.7%
Zone::Process		1.4%
HateList::GetTop	1.4%
Corpse::Process		1.0%
Database::GetItem	0.8%
EntityList::Process	0.7%
pow			0.7%
I didn't spend too long doing this test, and VTune was being it's usual awkward self and not showing me the source for most of the peaks, but Timer::Check was the clear winner

I examined the assembler for it and found this was the culprit ..

MOV al, BYTE PTR [ecx+08h] - stalls for 247 clock ticks

That's the check to see if the timer is enabled or not. It comfirms what I've assumed for a while now, memory access latency is THE performance killer on modern PC's. I should have tried changing the 'enabled' flag from an int8 to a 4-byte type and tried it again, but I'm tired now IIRC, 32-bit x86 processors aren't keen on 8 bit variables.

Interestingly rand() didn't show up at all, or _ftol (FPU conversion of a float to a long) which is a common performance killer.

I've rambled on enough anyway, hopefully someone finds this useful or at least slightly interesting

K.
Reply With Quote