EQEmulator Forums

EQEmulator Forums (https://www.eqemulator.org/forums/index.php)
-   Support::General Support (https://www.eqemulator.org/forums/forumdisplay.php?f=598)
-   -   Pet losing some - not all - his gear (https://www.eqemulator.org/forums/showthread.php?t=38505)

noudess 07-18-2014 03:53 PM

Pet losing some - not all - his gear
 
Had the same pet for at least a week. He has 7 pieces of rawhide armor. Every couple of days, the bp and legs disaappear. Anyone know where in the code this stuff gets updated? I look at the table and it has what I expect.. At some point, no idea when, he loses the bp and legs (not in table anymre). Really weird, Maybe an issue when loggin in and loading pet?

[ I see where it gets loaded and saved - that code looks fine. Will instrument it, but don't expect the problem to be there. ]

Uleat 07-18-2014 08:35 PM

Usually, a code problem like that will manifest itself every time.

You might try looking at all of your logs (emu server, mysql, etc...) to see if you are having write issues. A write failure would case items to 'randomly' disappear.


I am working through inventory code atm..so, I'll keep an eye out for anything that might be wonky.

provocating 07-18-2014 08:38 PM

No rent items? Nevermind just read where you said it was rawhide.

Uleat 07-18-2014 09:30 PM

No rent typically disappear after 30 minutes offline..I don't even know if pets check for that.

provocating 07-18-2014 09:44 PM

Quote:

Originally Posted by Uleat (Post 232052)
No rent typically disappear after 30 minutes offline..I don't even know if pets check for that.

I know, I missed that when I read his post.

noudess 07-22-2014 09:39 AM

From further testing, it looks like the database is fine after the items are lost in game. The db only gets messed up after you log out of the client, and the corrupted pet inventory gets written back out. So, the loss of items seems to be on zoning, and in memory.

I'm going to focus on that today and see what I can find.

noudess 07-22-2014 10:05 AM

Gave pet 12 items. Logged out to update db.

Logged back in and then zoned 9 times. On the 9th time only the 1st 5 items survived when I did a #showstats. The exact 1st 5 items, so the list got truncated. I'm guessing a race condition that didn't let the list get filled in, or perhaps somehow that process got interrupted?

This narrows the repeat process to logging in and zoning alot. I was the only one on the server at the time as well.

Akkadius 07-22-2014 10:11 AM

I want to say I experienced this bug myself when I was somewhat actively running Akka's Funhouse.

It is probably an initialization issue when entering the zone.

noudess 07-22-2014 11:29 AM

Looking for some feedback on what I am seeing.

I don't see where we're corrupting yet, but I find this odd.

Instrumentation:

- I have log messages in ~NPC and in the spot where the pet gets his gear restored.

Steps:

- Fresh server restart
- Login with existing pet to zone 1.
- I see the pet gear restore loop run fine.
- #zone to zone 2
- I see the pet gear restore loop run fine.
- #zone to zone 1 again.
- The process for zone1 hits the ~NPC code for the pet
- I see the pet gear restore fine after this.

Isn't it odd that the ~NPC code for the pet runs when I reenter the original zone and not when I leave one?

noudess 07-22-2014 01:48 PM

I think I have found the issue... information as soon as I verify the fix....

noudess 07-22-2014 02:06 PM

Ok....

It looks like the zone you are leaving stores the pets inventory into the DB using a loop of inserts. The zone you are entering retrieves the list with a single select.

Sometimes, depending on how fast you zone and how many items are in the pets inventory list, the receiving zone does the select before the sending zone has finished putting the items in.

I added a LOCK TABLES character_pet_inventory WRITE on the side saving the data, along with a release after the loop of inserts. I then added a LOCK TABLES READ on the destination zone's code to reload the data with the select.

I zoned 25 times and it has not failed with 12 pet items. That's never happened for me before, so I think that got it.

I'll post the code after a week or so of testing.

noudess 07-22-2014 03:05 PM

Patch code below:

Code:

***************
*** 2440,2459 ****
--- 2440,2464 ----
                                  errbuf);
                          safe_delete_array(query);
                  }
          }
 
+        database.RunQuery(query, MakeAnyLenString(&query, "LOCK TABLES `character_pet_inventory` WRITE", errbuf));
+        printf(errbuf);
+
          for(i=0; i<MAX_WORN_INVENTORY; i++) {
                  if(petinfo->Items[i]) {
                          database.RunQuery(query, MakeAnyLenString(&query,
                                  "INSERT INTO `character_pet_inventory` (`char_id`, `pet`, `slot`, `item_id`) values (%u, 0, %u, %u)",
                                  c->CharacterID(), i, petinfo->Items[i]), errbuf);
                          // should check for errors
                          safe_delete_array(query);
                  }
          }
 
+        database.RunQuery(query, MakeAnyLenString(&query,"UNLOCK TABLES", errbuf));
+        printf(errbuf);
         
          if(!database.RunQuery(query, MakeAnyLenString(&query,
                  "INSERT INTO `character_pet_info` (`char_id`, `pet`, `petname`, `petpower`, `spell_id`, `hp`, `mana`) "
                  "values (%u, 1, '%s', %u, %u, %u, %u) "
                  "ON DUPLICATE KEY UPDATE `petname`='%s', `petpower`=%i, `spell_id`=%u, `hp`=%u, `mana`=%u",
***************
*** 2581,2590 ****
--- 2586,2598 ----
                  LogFile->write(EQEMuLog::Error, "Error in LoadPetInfo query '%s': %s", query, errbuf);
                  safe_delete_array(query);
          return;
          }
 
+        database.RunQuery(query, MakeAnyLenString(&query, "LOCK TABLES `character_pet_inventory` WRITE", errbuf));
+        printf(errbuf);
+
          if (database.RunQuery(query, MakeAnyLenString(&query,
                  "SELECT `pet`, `slot`, `item_id` FROM `character_pet_inventory` WHERE `char_id`=%u",
                  c->CharacterID()), errbuf, &result))
          {
                  safe_delete_array(query);
***************
*** 2608,2614 ****
--- 2616,2625 ----
          else {
                  LogFile->write(EQEMuLog::Error, "Error in LoadPetInfo query '%s': %s", query, errbuf);
                  safe_delete_array(query);
                  return;
          }
+        // Unlock the character_pet_inventory table so people can read
+        database.RunQuery(query, MakeAnyLenString(&query, "UNLOCK TABLES",errbuf));
+        printf(errbuf);
 
  }


noudess 07-22-2014 05:17 PM

Fixed my problem but created other issues for other clients in the same zone....

My unlock is messing with other tables - apparantly no way to unlock just the one table..

Working on a cleaner solution using a mutex.

noudess 07-22-2014 06:04 PM

Ok, this is deeper into the infrastructure than I've messed with.

The idea of locking/unlocking the tables isn't going to work as unlock always releases all tables, and there is other code executing for other clients in the same zone process that locks tables.. So I can't use that method.

A simple mutex is out, as I'm on linux and mutexes don't cross processes without shared memory.

So, I'm guessing a semaphore.. In that case I need to know where all the zone processes are forked, so I can create the semaphore before the zones are forked off.

Or - is there a better way being used already to make sure that 2 zone processes get synbcronized?

Uleat 07-22-2014 09:14 PM

I've seen a few crash dumps involving pets..I wonder if this is related...

noudess 07-22-2014 10:18 PM

Ok...

I added a named semaphore and created it in eqlaunch before anything else.

Then I use it to access the pet inventory in the functions in zonedb.cpp.

It seems to me this is:

1) a bigger issue - since the zone I am leaving doesn't other work in Save() before the zone I land in starts accessing it..

or

2) Somehow another mechanism was meant to handle this and isn't for pet inventory.

I can show, so far, that without my fix, the zone I land in can read the content of the character_pet_inventory table before the zone I am leaving has it completely updated... thus the problem.

This is all on my Linux server.

I can post the named semaphore fix, but I'm not happy with it and I'm hoping someone more knowledgeable with how the old zone and new zone get ordered might step in and slap me up some ;)

noudess 07-23-2014 01:16 PM

My solution has evolved to a named semaphore in the client class. This semaphore is named after the character_id for the client. It is created on successful arrival in a zone (if you're coming from another zone, the CREAT ends up opening the one you left from (if its in use). There can, of course be multiple semaphores going on at once, for various characters, but they will clean themselves up on logout.

It's used when you save and read pet information.

It works, and I leave no semaphores around on exit.

I still feel that someone with a better idea of how db coordination is done between zones on zoning might have a more straightforward fix - or maybe tweak of existing mechanism.

Should I post about it on development? My fix works, but I'd think a patch more in tune with the original thought process might be better long term. I just can't see what that is.. :)

[ edit: expanded the critical section to start when the zoning flag is set and to end at client's destructor. The new zone now has perfect pet data. I also moved my check on the new zone to before we read in character buffs - with the wrong timing - these could get scrogged as well.

With 15 items, I could make the pet bug show up alot - and now its always perfect, And no semaphore baggage, all cleaned up.]

Let me know if you guys want the patch posted. My code is older, but I think the changes are small enough you can fit them in if you want them.

Players happy - they were sick of losing pet gear occasionally. ]

noudess 07-29-2014 09:56 AM

My fix works. No more pet gear lost. You guys want it? I don't mind posting the details.

Akkadius 07-29-2014 10:45 AM

Quote:

Originally Posted by noudess (Post 232292)
My fix works. No more pet gear lost. You guys want it? I don't mind posting the details.

Go ahead and post a diff

vsab 07-30-2014 08:42 AM

Looks like you have lots to contribute- it would be way easier for the dev's to integrate your changes if you forked the source on github ( https://github.com/EQEmu/Server ), made the change and then submitted a pull request.

noudess 07-30-2014 01:52 PM

I was really hoping that one of the core guys would pick this up, if it is going to be useful to the community. My fix works fine for my server (we have less than 20 players). Since it involved the use of a named semaphore, I'm loath to apply a diff to a branch and have someone just roll that in. For player bases that are more normal (I assume much larger than mine), Im worried that abnormal aborts, etc, might require semaphore cleanup and server restarts.

I've not run into that yet, but it seems inevitable.

So why did I bother posting? I guess I wanted to point the race condition on zoning and hope that someone more core to the project might have a better idea for the mainstream solution.

As an interim step, I'll post my diffs here.

noudess 07-30-2014 02:07 PM

I believe this is all the diffs. My code is divergent from the base, but this should give context and intent. Basically, I'm grabbing a named semaphore in the instance of client in the zone process when we zone and not releasing it until all pertinent data is saved. The recipient zone tries to grab the same semaphore, which ends up delaying the recipient in cases where saving had not finished yet. I've seen it delay, and solve the pet gear issue. The code in the zone error functions are to release the semaphore in the originating zone if the zoning process is blocked in some way (flags, etc). Note there is a diff semaphore per char id, so multiple zoning clients don't slow each other down.

My fear, is other hard crashes after the zoning client grabs the sem, but before it is released.

Also note, we close/delete the sem on both sides, all the time, to avoid issues with leftover semaphores. Its temping to leave the sem there for future zoning, but the dynamic state of things makes that very risky.

Tons of printfs left in the code so you can test. Remove as you see fit.


Code:

=== modified file 'zone/zoning.cpp'
--- zone/zoning.cpp        2013-07-09 15:28:09 +0000
+++ zone/zoning.cpp        2014-07-28 14:16:48 +0000
@@ -36,6 +36,13 @@
 #endif
 
        zoning = true;
+
+        // Lock a semaphore so the new client won't read
+        // until we're done writing out data
+        // This will get released on client exit
+        OpenZoningSem();
+        WaitZoningSem();
+
        if (app->size != sizeof(ZoneChange_Struct)) {
                LogFile->write(EQEMuLog::Debug, "Wrong size: OP_ZoneChange, size=%d, expected %d", app->size, sizeof(ZoneChange_Struct));
                return;
@@ -282,6 +289,11 @@
 }
 
 void Client::SendZoneCancel(ZoneChange_Struct *zc) {
+
+        // Not zoning, so client destructor won't be doing this.
+        PostZoningSem();
+        CloseZoningSem();
+
        //effectively zone them right back to where they were
        //unless we find a better way to stop the zoning process.
        SetPortExemption(true);
@@ -302,6 +314,10 @@
 {
        LogFile->write(EQEMuLog::Error, "Zone %i is not available because target wasn't found or character insufficent level", zc->zoneID);
       
+        // Not zoning, so client destructor won't be doing this.
+        PostZoningSem();
+        CloseZoningSem();
+
        SetPortExemption(true);
 
        EQApplicationPacket *outapp;

=== modified file 'zone/client.cpp'
--- zone/client.cpp        2014-01-11 17:57:12 +0000
+++ zone/client.cpp        2014-07-24 15:25:58 +0000
@@ -230,6 +230,7 @@
 zonesummon_ignorerestrictions = 0;
 zoning = false;
 zone_mode = ZoneUnsolicited;
+zoning_sem = (sem_t *) NULL;
 proximity_x = FLT_MAX;        //arbitrary large number
 proximity_y = FLT_MAX;
 proximity_z = FLT_MAX;
@@ -435,8 +436,107 @@
 
 entity_list.RemoveClient(this);
 UninitializeBuffSlots();
+
+// Clear the semaphore since this client is dying - zone/crash/logout
+PostZoningSem();
+CloseZoningSem();
 }
 
+// Added these functions to make sure saving of data finished before new
+// zone loaded it in.
+
+void Client::OpenZoningSem()
+        {
+        printf("Trying to CREATE a semaphore for char id %d.\n", CharacterID());
+
+        // Use a semaphore named after character id.
+        sprintf(zoning_sem_name,"%d-%s", CharacterID(),name);
+
+    zoning_sem=sem_open(zoning_sem_name, O_CREAT, 0644, 1);
+       
+        if (zoning_sem == SEM_FAILED)
+                {
+                printf("Cannot open zoning semaphore for char id %d.\n", CharacterID());
+                }
+        fflush(stdout);
+        }
+
+int Client::WaitZoningSem()
+        {
+        int        ret;
+
+        printf("Trying to WAIT a semaphore for char id %d.\n", CharacterID());
+
+        ret = sem_trywait(zoning_sem);
+
+        if (ret != 0)
+                {
+                printf("LOCKED - WAITING\n");
+                ret = sem_wait(zoning_sem);
+                }
+
+        if (ret != 0)
+                {
+                printf("Error waiting on zoning semaphore.\n");
+                }
+
+        fflush(stdout);
+        return ret;
+        }
+
+int Client::PostZoningSem()
+        {
+        int ret=0;
+
+        printf("Trying to POST a semaphore for char id %d.\n", CharacterID());
+
+        // Its possible for the sem to not exist, if the client never zoned.
+        if (zoning_sem != (sem_t *) NULL)
+                {
+                ret= sem_post(zoning_sem);
+               
+                if (ret != 0)
+                        {
+                        printf("Error posting on zoning semaphore.\n");
+                        }
+                }
+
+        fflush(stdout);
+        return ret;
+        }
+
+int Client::CloseZoningSem()
+        {
+        int ret;
+
+        // This gets called by destructor, when the sem should already be closed
+        // So lets not do extra work if so.
+        if (zoning_sem != (sem_t *) NULL)
+                {
+                printf("Trying to CLOSE a semaphore for char id %d.\n", CharacterID());
+
+                ret = sem_close(zoning_sem);
+
+                if (ret != 0)
+                        {
+                        printf("Error closing on zoning semaphore.\n");
+                        }
+
+                ret = sem_unlink(zoning_sem_name);       
+
+                if (ret != 0)
+                        {
+                        printf("Error unlinking on zoning semaphore.\n");
+                        }
+                }
+       
+        fflush(stdout);
+
+        zoning_sem = (sem_t *) NULL;
+
+        return ret;
+        }
+
 void Client::SendLogoutPackets() {
 
 EQApplicationPacket* outapp = new EQApplicationPacket(OP_CancelTrade, sizeof(CancelTrade_Struct));

=== modified file 'zone/client.h'
--- zone/client.h        2013-05-02 18:37:13 +0000
+++ zone/client.h        2014-07-23 19:11:48 +0000
@@ -19,6 +19,7 @@
 #define CLIENT_H
 class Client;
 
+#include <semaphore.h>
 #include "../common/timer.h"
 #include "../common/ptimer.h"
 #include "../common/emu_opcodes.h"
@@ -491,6 +492,10 @@
        inline uint32        GetGold()                const { return m_pp.gold; }
        inline uint32        GetPlatinum()        const { return m_pp.platinum; }
 
+        void        OpenZoningSem();
+        int                WaitZoningSem();
+        int                PostZoningSem();
+        int                CloseZoningSem();
 
        /*Endurance and such*/
        void        CalcMaxEndurance();        //This calculates the maximum endurance we can have
@@ -1303,6 +1308,8 @@
        uint8        zonesummon_ignorerestrictions;
        ZoneMode zone_mode;
 
+        sem_t        *zoning_sem;
+        char        zoning_sem_name[75];
 
        Timer        position_timer;
        uint8        position_timer_counter;

=== modified file 'zone/client_packet.cpp'
--- zone/client_packet.cpp        2014-01-18 17:25:31 +0000
+++ zone/client_packet.cpp        2014-07-24 22:16:53 +0000
@@ -8971,7 +8971,13 @@
        }
 
 
+        // Obtain a lock that proves the old zone is done
+        // Updating this data.
+        OpenZoningSem();
+        WaitZoningSem();
+
        if (spells_loaded)
+
        {
                for(uint32 z=0;z<MAX_PP_MEMSPELL;z++)
                {
@@ -9154,6 +9160,7 @@
                rest_timer.Start(m_pp.RestTimer * 1000);
 
        database.LoadPetInfo(this);
+
        //this was moved before the spawn packets are sent
        //in hopes that it adds more consistency...
        //Remake pet
@@ -9169,9 +9176,15 @@
                }
                m_petinfo.SpellID = 0;
        }
+
        // Moved here so it's after where we load the pet data.
        if(!GetAA(aaPersistentMinion))
                memset(&m_suspendedminion, 0, sizeof(PetInfo));
+
+        // I think this was the last thing we need to load.
+        // Release the Semaphore.
+        PostZoningSem();
+        CloseZoningSem();
 
        ////////////////////////////////////////////////////////////
        // Server Zone Entry Packet


noudess 08-13-2014 03:49 PM

This is still working for me after weeks. Again, I have less than 10 players, so a large playerbase might be an issue if crashes end up leaving the named sem around. Pet gear loss completely fixed however, and it safeguards against other data loss as well.


All times are GMT -4. The time now is 01:10 AM.

Powered by vBulletin®, Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.