Pet losing some - not all - his gear
Had the same pet for at least a week. He has 7 pieces of rawhide armor. Every couple of days, the bp and legs disaappear. Anyone know where in the code this stuff gets updated? I look at the table and it has what I expect.. At some point, no idea when, he loses the bp and legs (not in table anymre). Really weird, Maybe an issue when loggin in and loading pet?
[ I see where it gets loaded and saved - that code looks fine. Will instrument it, but don't expect the problem to be there. ] |
Usually, a code problem like that will manifest itself every time.
You might try looking at all of your logs (emu server, mysql, etc...) to see if you are having write issues. A write failure would case items to 'randomly' disappear. I am working through inventory code atm..so, I'll keep an eye out for anything that might be wonky. |
No rent items? Nevermind just read where you said it was rawhide.
|
No rent typically disappear after 30 minutes offline..I don't even know if pets check for that.
|
Quote:
|
From further testing, it looks like the database is fine after the items are lost in game. The db only gets messed up after you log out of the client, and the corrupted pet inventory gets written back out. So, the loss of items seems to be on zoning, and in memory.
I'm going to focus on that today and see what I can find. |
Gave pet 12 items. Logged out to update db.
Logged back in and then zoned 9 times. On the 9th time only the 1st 5 items survived when I did a #showstats. The exact 1st 5 items, so the list got truncated. I'm guessing a race condition that didn't let the list get filled in, or perhaps somehow that process got interrupted? This narrows the repeat process to logging in and zoning alot. I was the only one on the server at the time as well. |
I want to say I experienced this bug myself when I was somewhat actively running Akka's Funhouse.
It is probably an initialization issue when entering the zone. |
Looking for some feedback on what I am seeing.
I don't see where we're corrupting yet, but I find this odd. Instrumentation: - I have log messages in ~NPC and in the spot where the pet gets his gear restored. Steps: - Fresh server restart - Login with existing pet to zone 1. - I see the pet gear restore loop run fine. - #zone to zone 2 - I see the pet gear restore loop run fine. - #zone to zone 1 again. - The process for zone1 hits the ~NPC code for the pet - I see the pet gear restore fine after this. Isn't it odd that the ~NPC code for the pet runs when I reenter the original zone and not when I leave one? |
I think I have found the issue... information as soon as I verify the fix....
|
Ok....
It looks like the zone you are leaving stores the pets inventory into the DB using a loop of inserts. The zone you are entering retrieves the list with a single select. Sometimes, depending on how fast you zone and how many items are in the pets inventory list, the receiving zone does the select before the sending zone has finished putting the items in. I added a LOCK TABLES character_pet_inventory WRITE on the side saving the data, along with a release after the loop of inserts. I then added a LOCK TABLES READ on the destination zone's code to reload the data with the select. I zoned 25 times and it has not failed with 12 pet items. That's never happened for me before, so I think that got it. I'll post the code after a week or so of testing. |
Patch code below:
Code:
*************** |
Fixed my problem but created other issues for other clients in the same zone....
My unlock is messing with other tables - apparantly no way to unlock just the one table.. Working on a cleaner solution using a mutex. |
Ok, this is deeper into the infrastructure than I've messed with.
The idea of locking/unlocking the tables isn't going to work as unlock always releases all tables, and there is other code executing for other clients in the same zone process that locks tables.. So I can't use that method. A simple mutex is out, as I'm on linux and mutexes don't cross processes without shared memory. So, I'm guessing a semaphore.. In that case I need to know where all the zone processes are forked, so I can create the semaphore before the zones are forked off. Or - is there a better way being used already to make sure that 2 zone processes get synbcronized? |
I've seen a few crash dumps involving pets..I wonder if this is related...
|
Ok...
I added a named semaphore and created it in eqlaunch before anything else. Then I use it to access the pet inventory in the functions in zonedb.cpp. It seems to me this is: 1) a bigger issue - since the zone I am leaving doesn't other work in Save() before the zone I land in starts accessing it.. or 2) Somehow another mechanism was meant to handle this and isn't for pet inventory. I can show, so far, that without my fix, the zone I land in can read the content of the character_pet_inventory table before the zone I am leaving has it completely updated... thus the problem. This is all on my Linux server. I can post the named semaphore fix, but I'm not happy with it and I'm hoping someone more knowledgeable with how the old zone and new zone get ordered might step in and slap me up some ;) |
My solution has evolved to a named semaphore in the client class. This semaphore is named after the character_id for the client. It is created on successful arrival in a zone (if you're coming from another zone, the CREAT ends up opening the one you left from (if its in use). There can, of course be multiple semaphores going on at once, for various characters, but they will clean themselves up on logout.
It's used when you save and read pet information. It works, and I leave no semaphores around on exit. I still feel that someone with a better idea of how db coordination is done between zones on zoning might have a more straightforward fix - or maybe tweak of existing mechanism. Should I post about it on development? My fix works, but I'd think a patch more in tune with the original thought process might be better long term. I just can't see what that is.. :) [ edit: expanded the critical section to start when the zoning flag is set and to end at client's destructor. The new zone now has perfect pet data. I also moved my check on the new zone to before we read in character buffs - with the wrong timing - these could get scrogged as well. With 15 items, I could make the pet bug show up alot - and now its always perfect, And no semaphore baggage, all cleaned up.] Let me know if you guys want the patch posted. My code is older, but I think the changes are small enough you can fit them in if you want them. Players happy - they were sick of losing pet gear occasionally. ] |
My fix works. No more pet gear lost. You guys want it? I don't mind posting the details.
|
Quote:
|
Looks like you have lots to contribute- it would be way easier for the dev's to integrate your changes if you forked the source on github ( https://github.com/EQEmu/Server ), made the change and then submitted a pull request.
|
I was really hoping that one of the core guys would pick this up, if it is going to be useful to the community. My fix works fine for my server (we have less than 20 players). Since it involved the use of a named semaphore, I'm loath to apply a diff to a branch and have someone just roll that in. For player bases that are more normal (I assume much larger than mine), Im worried that abnormal aborts, etc, might require semaphore cleanup and server restarts.
I've not run into that yet, but it seems inevitable. So why did I bother posting? I guess I wanted to point the race condition on zoning and hope that someone more core to the project might have a better idea for the mainstream solution. As an interim step, I'll post my diffs here. |
I believe this is all the diffs. My code is divergent from the base, but this should give context and intent. Basically, I'm grabbing a named semaphore in the instance of client in the zone process when we zone and not releasing it until all pertinent data is saved. The recipient zone tries to grab the same semaphore, which ends up delaying the recipient in cases where saving had not finished yet. I've seen it delay, and solve the pet gear issue. The code in the zone error functions are to release the semaphore in the originating zone if the zoning process is blocked in some way (flags, etc). Note there is a diff semaphore per char id, so multiple zoning clients don't slow each other down.
My fear, is other hard crashes after the zoning client grabs the sem, but before it is released. Also note, we close/delete the sem on both sides, all the time, to avoid issues with leftover semaphores. Its temping to leave the sem there for future zoning, but the dynamic state of things makes that very risky. Tons of printfs left in the code so you can test. Remove as you see fit. Code:
=== modified file 'zone/zoning.cpp' |
This is still working for me after weeks. Again, I have less than 10 players, so a large playerbase might be an issue if crashes end up leaving the named sem around. Pet gear loss completely fixed however, and it safeguards against other data loss as well.
|
All times are GMT -4. The time now is 01:10 AM. |
Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.