ArcEmu: One Question About Socketmgrlinux - ArcEmu

Jump to content

Toggle shoutbox Lastest Announcements

dfighter  : (07 December 2014 - 12:06 PM) Arcemu is in hibernation mode, please read http://arcemu.org/fo...showtopic=26903
dfighter  : (01 January 2013 - 05:56 PM) Arcemu wishes you all a happy new year!
Hasbro  : (12 September 2012 - 10:01 AM) Please excuse our outage from the web! Our web host had a major malfunction!
dfighter  : (01 September 2012 - 04:05 PM) Since the spam bots just don't want to stop, I've enabled admin verification when registering.
dfighter  : (23 January 2012 - 09:56 PM) Please note that from now on you will need to confirm your email on the wiki in order to edit it!
Hasbro  : (31 December 2011 - 12:50 PM) Happy New Years all!
Navid  : (26 December 2011 - 04:09 AM) Merry Christmas !!!!!! Happy holidays all :)
WAmadeus  : (24 December 2011 - 03:54 PM) Merry Christmas to all!
dfighter  : (24 December 2011 - 11:05 AM) The Arcemu team wishes y'all a Merry Christmukkah!
Hasbro  : (05 October 2011 - 12:53 PM) Looking for web designers for upcoming web related project. If you're interested in designing user interfaces contact me
dfighter  : (02 September 2011 - 03:47 PM) So who here wants vehicles in Arcemu? :P http://arcemu.org/fo...showtopic=25440
Hasbro  : (14 August 2011 - 03:25 PM) Join us on irc, grab an irc client and connect to irc.freenode.net join channel #arcemu /server irc.freenode.net:6667 /join #arcemu
jackpoz  : (03 August 2011 - 05:33 AM) to all Lua Engine (old one) users: please check http://arcemu.org/fo...showtopic=25274
Hasbro  : (20 May 2011 - 05:27 PM) Looking for people experienced with CMake configuration and setup! Contact me asap
Hasbro  : (15 May 2011 - 05:03 PM) ArcEmu is recruiting C++ programmers, contact Hasbro if interested.
paroxysm  : (03 May 2011 - 06:26 PM) Updated luabridge gossip example to describe the whole gossip creation process rather than just how to create menu. Gossip tutorial
paroxysm  : (23 April 2011 - 11:35 AM) Lua writers can refer to the Luabridge Tutorials section in the Wiki to learn how to write gossip code correctly.
Hasbro  : (20 April 2011 - 05:22 PM) Thank you for your continuous contribution of bug reports, we are working on them.
Hasbro  : (17 April 2011 - 03:20 AM) Please consider donating to support our bills. Donations can be sent using PayPal to donations@arcemu.org - Thank you for your support.
paroxysm  : (10 April 2011 - 12:43 AM) Refer to the Luabridge Tutorials section in the Wiki to learn the new syntax of luabridge.
Resize Shouts Area

Developing

Interesting in developing with us? Join us on IRC: irc.emulationnetwork.com:6667 #arcemu
Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

One Question About Socketmgrlinux about multithread issue

#1 User is offline   BigHead 

  • Newbie
  • Group: Members
  • Posts: 2
  • Joined: 23-May 11
  • Gender:Male
  • Server OS:Linux

Posted 31 May 2011 - 03:50 AM

Hi, all. I'm newcommer from Shanghai, China.

I began to read the code one weeks ago.
But I find the SocketMgr::SpawnWorkerThreads function in SocketMgrLinux.cpp only spawn one thread.

Can anyone tell me why? Why not use multithread to improve the concurrency?

Thanks a lot....
0

#2 User is offline   dfighter 

  • Titles are overrated
  • PipPipPipPipPipPipPipPipPipPip
  • Group: Administrator
  • Posts: 5,189
  • Joined: 14-June 08
  • IRC:dfighter
  • Gender:Male
  • Server OS:Linux

Posted 31 May 2011 - 04:58 AM

Hi there and welcome to our forums!

It was intended to be like that originally, but it has never been finished, since a new socket project was started, which was then later stopped, so now it's stuck the way it is ( All this was long years ago, before Arcemu even existed ).
Maybe you will finish it and submit a patch? :P
"The demand for free goods is infinite."
1

#3 User is offline   BigHead 

  • Newbie
  • Group: Members
  • Posts: 2
  • Joined: 23-May 11
  • Gender:Male
  • Server OS:Linux

Posted 01 June 2011 - 12:47 AM

OK..I'm glad to patch it. I think I will read the coding standards first.

And I think we only need to add a little modification to support multithread in linux:
1. Each SocketWorkerThread will hold a epoll_fd, fds, listenfds, and each thread call epoll_wait and notify the events.
2. SocketMgr will be responsible for add and remove socket to workerthreads.

I will think about it more clearly this week... :P

View Postdfighter, on 31 May 2011 - 04:58 AM, said:

Hi there and welcome to our forums!

It was intended to be like that originally, but it has never been finished, since a new socket project was started, which was then later stopped, so now it's stuck the way it is ( All this was long years ago, before Arcemu even existed ).
Maybe you will finish it and submit a patch? :)

0

#4 User is offline   dfighter 

  • Titles are overrated
  • PipPipPipPipPipPipPipPipPipPip
  • Group: Administrator
  • Posts: 5,189
  • Joined: 14-June 08
  • IRC:dfighter
  • Gender:Male
  • Server OS:Linux

Posted 01 June 2011 - 12:56 AM

View PostBigHead, on 01 June 2011 - 12:47 AM, said:

OK..I'm glad to patch it. I think I will read the coding standards first.

And I think we only need to add a little modification to support multithread in linux:
1. Each SocketWorkerThread will hold a epoll_fd, fds, listenfds, and each thread call epoll_wait and notify the events.
2. SocketMgr will be responsible for add and remove socket to workerthreads.

I will think about it more clearly this week... :P



Sure, sounds great!
We don't really have a coding convention tho, if you are really looking for a preference take a look at my code commits and try to use my style :)
Tbh the point is only the comments + that you write "sparse" code, so instead of for example, instead of
somefunction(someparameter)

write
somefunction( someparameter )

That's about it.

It's also nice if in headers you write the public members first and only then the private one :)
"The demand for free goods is infinite."
0

#5 User is offline   jackpoz 

  • ArcEmu Lemon Priest
  • PipPipPipPipPipPipPipPip
  • Group: Developers
  • Posts: 2,153
  • Joined: 19-June 08
  • Gender:Male
  • Location:Italy
  • Server OS:Windows

Posted 28 July 2011 - 09:13 AM

Sharing a little bit of knowledge about Socket issues:
Right now there's a threading issue about the Socket type because some of the derived types are handled by 2 different threads at time. Let's take for example LogonCommClientSocket : Socket defined in trunk\src\arcemu-world\LogonCommClient.h and used by world to communicate with logon (world and logon being the names of 2 executables used by arcemu). Callstack of how this socket is created as of r4408:
- Master::Run() line 383 calls sLogonCommHandler.Startup()
- sLogonCommHandler.Startup() line 145 calls
ThreadPool.ExecuteTask( new LogonCommWatcherThread() );
. LogonCommWatcherThread derives from ThreadBase and overrides the run() methods in this way:
	bool run()
	{
		sLogonCommHandler.ConnectAll();
		while( running )
		{
			sLogonCommHandler.UpdateSockets();

			cond.Wait( 5000 );
		}

		return true;
	}

- on the new thread created by ThreadPool.ExecuteTask LogonCommWatcherThread::run() calls line 112 sLogonCommHandler.ConnectAll()
void LogonCommHandler::ConnectAll()
{
	Log.Success("LogonCommClient", "Attempting to connect to logon server...");
	for(set<LogonServer*>::iterator itr = servers.begin(); itr != servers.end(); ++itr)
		Connect(*itr);
}

"servers" is read from the config files and contains the infos required by world to connect to logon (there can be more than 1 logon)
- LogonCommHandler::Connect() line 171 calls LogonCommClientSocket * conn = ConnectToLogon(server->Address, server->Port). An instance of LogonCommClientSocket is created by ConnectToLogon() and returned.
- LogonCommHandler::ConnectToLogon() line 60 calls
LogonCommClientSocket * conn = ConnectTCPSocket<LogonCommClientSocket>(Address.c_str(), static_cast<u_short>( Port ));

- ConnectTCPSocket() lines 211 and 212
	T * s = new T(0);
	if(!s->Connect(hostname, port))
create the LogonCommClientSocket and then try to connect it.
- Socket::Connect() line 57 calls _OnConnect()
- void Socket::_OnConnect()
	sSocketMgr.AddSocket(this);

	// Call virtual onconnect
	OnConnect();

first add the socket to sSocketMgr so it will be accessed by SocketWorkerThread::run() running on another thread. This will update the socket for all its lifetime.
Then the virtual OnConnect() is called.
- back to LogonCommHandler::Connect() line 172 logons[server] = conn; is called. This keeps a reference of the created socket and makes it accessible to LogonCommHandler::UpdateSockets()
void LogonCommHandler::UpdateSockets()
{
	mapLock.Acquire();

	map<LogonServer*, LogonCommClientSocket*>::iterator itr = logons.begin();
	LogonCommClientSocket * cs;
	uint32 t = (uint32)UNIXTIME;
	for(; itr != logons.end(); ++itr)
	{
		cs = itr->second;
		if(cs != 0)
		{
			if(!pings) continue;

            if(cs->IsDeleted() || !cs->IsConnected())
            {
                cs->_id = 0;
                itr->second = 0;
                continue;
            }

			if(cs->last_pong < t && ((t - cs->last_pong) > 60))
			{
				// no pong for 60 seconds -> remove the socket
				LOG_DETAIL(" >> realm id %u connection dropped due to pong timeout.", (unsigned int)itr->first->ID);
				cs->_id = 0;
				cs->Disconnect(); //the socket is disconnected from another thread than SocketMgr!!!
				itr->second = 0;
				continue;
			}

			if( (t - cs->last_ping) > 15 )
			{
				// send a ping packet.
				cs->SendPing();
			}
		}
		else
		{
			// check retry time
			if(t >= itr->first->RetryTime)
			{
				Connect(itr->first);
			}
		}
	}
	mapLock.Release();
}

LogonCommHandler::UpdateSockets() is called by LogonCommWatcherThread::run(), the same that called sLogonCommHandler.ConnectAll() in the beginning and that is running on its own thread. If you check what LogonCommHandler::UpdateSockets() does, you'll see at line 278 that it disconnectes the socket, but this is from a different thread than the SocketMgr ones!!! This means that if the SocketMgr threads are accessing the same socket, there's a chance they will access a deleted socket!!!
This issue was hidden like dust under the carpet with a socket garbage collector:
void Socket::Delete()
{
	//if returns true it means it's already delete
	if(m_deleted.SetVal(true))
		return;

	sLog.outDebug("Socket::Delete() on socket %u", m_fd);

	if(IsConnected()) Disconnect();

	SocketOps::CloseSocket( m_fd );

	sSocketGarbageCollector.QueueSocket(this); <--- queued for deletion
}

so if the garbage collector is "slow enough" and SocketMgr is "fast enough", nothing wrong will happen. But we don't live in an ideal world and that's why this crash is right now the main reson of arcemu crashes.
LogonCommServerSocket has the same issue because it's added to sInfoCore and checked in LogonServer::Run().
Posible solutions are to keep track of the references to each Socket or to allow only 1 and only 1 thread at time to access it.
I hope this explaination is somehow understandable, if not feel free to ask (I'm sure a debugging session will explain it even better).
Posted Image We develop dreams. Your dreams ;)
Posted ImagePosted Image
0

#6 User is offline   ZymoticB 

  • Newbie
  • Group: Members
  • Posts: 9
  • Joined: 25-July 11
  • Gender:Male
  • Server OS:Darwin/OS X

Posted 08 August 2011 - 10:06 PM

Could this be solved by adding an atomic reference count to the LogonCommClientSocket, or even Sockets in general?

This would allow as many threads as needed to have a reference to a socket temporarily.

As soon as a socket needs to be disconnected and deleted it will be queued and flagged for disconnection/deletion. This 'deletion flag' would then prevent any more threads from atomically incrementing the reference count and using the socket but would leave the socket connected until all threads that have a reference are done with the socket. The disconnect thread would then wait until the reference count is 0 before beginning the disconnect/deletion process. If the disconnect thread should be free i.e. not waiting. A disconnect queue could be created.

The disconnect queue would make sense since a socket can only be present in the queue if it is flagged for disconnection. When the queue updates it would check the reference count of all the sockets, or based on empirical performance evidence a subset of them, and any that were 0 would be disconnected. This would also allow a recovery process or cancelling of disconnection in a much cleaner way than having to signal the thread waiting for a counter to reach 0.

Any thoughts?


EDIT: Fixed the deletion flag logic. Originally stated it would prevent use of the flagged socket. Meant to say it would prevent any new threads using the socket but allow those with a reference to continue.
0

#7 User is offline   jackpoz 

  • ArcEmu Lemon Priest
  • PipPipPipPipPipPipPipPip
  • Group: Developers
  • Posts: 2,153
  • Joined: 19-June 08
  • Gender:Male
  • Location:Italy
  • Server OS:Windows

Posted 09 August 2011 - 06:17 AM

Right now LogonCommHandler::UpdateSockets() only checks the ping so we should first ask if it's really necessary to have 1 thread checking pings of LogonCommClientSockets. If it's not strictly necessary and those checks can be moved to SocketMgr threads, then the issue can be restricted to a even-more limited amount of Socket-derived types. There's already a deletion queue ( sSocketGarbageCollector.QueueSocket(this) ) and the disconnection/queue for deletion processes already use atomics since https://sourceforge..../changeset/3632 . When a Socket is disconnected, it's queued right away for deletion (unless it was already flagged as disconnected/queued). The only part missing to implement your idea is the ref count.
Posted Image We develop dreams. Your dreams ;)
Posted ImagePosted Image
0

#8 User is offline   ZymoticB 

  • Newbie
  • Group: Members
  • Posts: 9
  • Joined: 25-July 11
  • Gender:Male
  • Server OS:Darwin/OS X

Posted 09 August 2011 - 08:45 AM

Right the sockets are already using atomic booleans which protect the operations on the socket (I checked the ReadCallback method and it checks the atomics before doing any work). The problem is that once a SocketWorkerThread::run() has created a local pointer via ptr = mgr->fds[events[i].data.fd]; if the Socket is then deleted bad things will happen.

What I am suggesting is to create a second queue (this could be done differently just an example) so that we can wait on a sort of 'countdown' event. We are waiting for all thread to finish using this socket. We set an atomic so that no new threads will create local versions of this socket and all threads using the socket will decrement the count until it reachs 0 at which point the 'disconnect queue' (again could be implemented differently) would disconnect the socket, flag it as deleted an send it to the deletion queue.
0

#9 User is offline   jackpoz 

  • ArcEmu Lemon Priest
  • PipPipPipPipPipPipPipPip
  • Group: Developers
  • Posts: 2,153
  • Joined: 19-June 08
  • Gender:Male
  • Location:Italy
  • Server OS:Windows

Posted 09 August 2011 - 01:47 PM

View PostZymoticB, on 09 August 2011 - 08:45 AM, said:

The problem is that once a SocketWorkerThread::run() has created a local pointer via ptr = mgr->fds[events[i].data.fd]; if the Socket is then deleted bad things will happen.

That's actually what I wrote in my first post.

View PostZymoticB, on 09 August 2011 - 08:45 AM, said:

...and send it to the deletion queue.

but in your way 2 queues aren't needed, the disconnection one would be enough. But why is another queue needed? Can't the deletion queue check your atomic?
Posted Image We develop dreams. Your dreams ;)
Posted ImagePosted Image
0

#10 User is offline   ZymoticB 

  • Newbie
  • Group: Members
  • Posts: 9
  • Joined: 25-July 11
  • Gender:Male
  • Server OS:Darwin/OS X

Posted 09 August 2011 - 02:44 PM

Sorry to repeat you I was just thinking and typing.

Another queue probably isn't needed. I guess I was subconsciously avoiding changing the function of the deletion queue to a disconnection queue. There is no harm in disconnecting a socket and leaving it in a deletion queue as long as it is not deleted early. But would it make more sense to delay the disconnection? This could cause long delays though waiting for timeouts, maybe the the only functionality that needs changing is the deletion queue should update and check an atomic reference count rather than a time? This would allow immediate disconnection and should solve the problem of dangling Socket pointers.
0

#11 User is offline   jackpoz 

  • ArcEmu Lemon Priest
  • PipPipPipPipPipPipPipPip
  • Group: Developers
  • Posts: 2,153
  • Joined: 19-June 08
  • Gender:Male
  • Location:Italy
  • Server OS:Windows

Posted 09 August 2011 - 04:10 PM

Exactly, right now the deletion queue is just an hackfix to hide crashes. With the usual IncRef()/DecRef() no queue would probably be needed, as the Socket will deleted as soon as ref count will reach 0. Are you interested writing a patch and submitting it :P ?
Posted Image We develop dreams. Your dreams ;)
Posted ImagePosted Image
0

#12 User is offline   ZymoticB 

  • Newbie
  • Group: Members
  • Posts: 9
  • Joined: 25-July 11
  • Gender:Male
  • Server OS:Darwin/OS X

Posted 10 August 2011 - 08:11 AM

I was planning on writing a patch. Just wanted to flesh out the design instead of blindly coding, hey maybe school did teach me something. I am wondering the best way to approach this though. Because right now how I am thinking of it, I cannont expect if one thread detects a problem causing a disconnect that all of them will, therefore I cannot rely on ->Disconnect() DecRef'ing. Should I be increasing the reference count each time a thread takes ownership of it and decrease at the end. Seems logical but atomic operations are a bit expensive, better than locking though.

Anyways I have 2 thoughts:

1) Remove the m_deleted atomic since with this implementation if it is disconnected it is deleted and it has been removed from the SocketMgr so we shouldn't need to worry about anyone new creating local copies, correct? I would then create a m_queuedForDisconnection (or some such thing) that is a atomic since logically the refCount could reach 0 many many times but it should only be disconnected if it needs to be; hence the atomic.

2) Alternatively, I could initialize the refCount to 1 when the socket is created. This would be a single reference to help the socket persist. Additional references are added and taken away dynamically and the ->Disconnect() call will do a decRef that will cause the socket to no longer persist.

I am also wondering what to do with the Disconnect function. In both cases it will become a 'renamed' decRef, since all it is doing is decresing the reference, checking if the refCount is 0 and if it is called a private function that will actually do the disconnection (we already have a _Disconnect do we not? most likely will modify that). However the decRef that needs to be called at the end of each loop performs the same thing.


I am thinking I will create an AquireSocket and ReleaseSocket, or maybe GetSocket and ReturnSocket. These functions will perform the functionality of the inc and dec ref + check need to disconnect.

I think I prefer the second option. Let me know what you think !
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users