Just to update seems I was wrong on this being the source of the deadlock. I still don't think its a good idea for us to lock a mutex we already locked before, seems like a bad design and might not be the only place.
In any case I reviewed and in net.cpp there was some older code that was delaying how long till the reconnect happens which I removed (so now its solely on the 10 second timer instead of like 120+ seconds).
What I did see is last night we failed our first reconnect attempt to the eqemu LS, typically I don't even see a reconnect attempt just the ending thread error. I also added a log message inside the AutoInitLoginServer thread creation in net.cpp to track this:
20366 [11.06. - 22:37:22] [COMMON__THREADS] Ending TCPConnectionLoop with thread ID -54917376
20366 [11.06. - 22:37:24] [WORLD__INIT_ERR] Not all login servers are connected, calling AutoInitLoginServer.
20366 [11.06. - 22:37:24] [WORLD__LS] Connecting to login server: login.eqemulator.net:5998
20366 [11.06. - 22:37:24] [COMMON__THREADS] Starting TCPConnectionLoop with thread ID -546068736
20366 [11.06. - 22:37:34] [WORLD__INIT_ERR] Not all login servers are connected, calling AutoInitLoginServer.
20366 [11.06. - 22:37:34] [WORLD__LS] Connecting to login server: login.eqemulator.net:5998
20366 [11.06. - 22:37:34] [WORLD__LS] Connected to Loginserver: login.eqemulator.net:5998
Will continue monitoring to see if any issues happen again, but maybe this shorter retry is helping the situation.
__________________
www.eq2emu.com
EQ2Emu Developer
Former EQEMu Developer / GuildWars / Zek Seasons Servers
Member of the "I hate devn00b" club.
|