ArcEmu: Benchmarks: Std::tr1::unordered_map Vs Boost::unordered_map Vs Std::map - ArcEmu

Jump to content

Toggle shoutbox Lastest Announcements

dfighter  : (07 December 2014 - 12:06 PM) Arcemu is in hibernation mode, please read http://arcemu.org/fo...showtopic=26903
dfighter  : (01 January 2013 - 05:56 PM) Arcemu wishes you all a happy new year!
Hasbro  : (12 September 2012 - 10:01 AM) Please excuse our outage from the web! Our web host had a major malfunction!
dfighter  : (01 September 2012 - 04:05 PM) Since the spam bots just don't want to stop, I've enabled admin verification when registering.
dfighter  : (23 January 2012 - 09:56 PM) Please note that from now on you will need to confirm your email on the wiki in order to edit it!
Hasbro  : (31 December 2011 - 12:50 PM) Happy New Years all!
Navid  : (26 December 2011 - 04:09 AM) Merry Christmas !!!!!! Happy holidays all :)
WAmadeus  : (24 December 2011 - 03:54 PM) Merry Christmas to all!
dfighter  : (24 December 2011 - 11:05 AM) The Arcemu team wishes y'all a Merry Christmukkah!
Hasbro  : (05 October 2011 - 12:53 PM) Looking for web designers for upcoming web related project. If you're interested in designing user interfaces contact me
dfighter  : (02 September 2011 - 03:47 PM) So who here wants vehicles in Arcemu? :P http://arcemu.org/fo...showtopic=25440
Hasbro  : (14 August 2011 - 03:25 PM) Join us on irc, grab an irc client and connect to irc.freenode.net join channel #arcemu /server irc.freenode.net:6667 /join #arcemu
jackpoz  : (03 August 2011 - 05:33 AM) to all Lua Engine (old one) users: please check http://arcemu.org/fo...showtopic=25274
Hasbro  : (20 May 2011 - 05:27 PM) Looking for people experienced with CMake configuration and setup! Contact me asap
Hasbro  : (15 May 2011 - 05:03 PM) ArcEmu is recruiting C++ programmers, contact Hasbro if interested.
paroxysm  : (03 May 2011 - 06:26 PM) Updated luabridge gossip example to describe the whole gossip creation process rather than just how to create menu. Gossip tutorial
paroxysm  : (23 April 2011 - 11:35 AM) Lua writers can refer to the Luabridge Tutorials section in the Wiki to learn how to write gossip code correctly.
Hasbro  : (20 April 2011 - 05:22 PM) Thank you for your continuous contribution of bug reports, we are working on them.
Hasbro  : (17 April 2011 - 03:20 AM) Please consider donating to support our bills. Donations can be sent using PayPal to donations@arcemu.org - Thank you for your support.
paroxysm  : (10 April 2011 - 12:43 AM) Refer to the Luabridge Tutorials section in the Wiki to learn the new syntax of luabridge.
Resize Shouts Area

Developing

Interesting in developing with us? Join us on IRC: irc.emulationnetwork.com:6667 #arcemu
Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Benchmarks: Std::tr1::unordered_map Vs Boost::unordered_map Vs Std::map Massive improvements!

#1 User is offline   markg85 

  • Advanced Member
  • Pip
  • Group: Members
  • Posts: 45
  • Joined: 09-August 08
  • Gender:Male
  • Location:The Netherlands
  • Interests:C++, linux, open source in general and programming in general.

Posted 30 January 2010 - 12:11 PM

Hi,

Hasbro wanted me to benchmark and he actually wanted to use the std::tr1::unordered_map instead of boost to prevent a LOT of headers to get in SVN.
I did some benchmarks and the numbers are nice! Boost wins everywhere!

Before you look at the benchmarks know that the BLUE bars are x86 benchmarks! You can forget them since everyone uses x64 (which is RED and GREEN). If you don't use x64 then the following benchmarks will clearly point out some performance difference. For all benchmarks. The lower the better.

First: Inserting 1 million random integers in the lists
Posted Image

No green bar for a normal map (not tested) and for std::tr1::unordered_map since the results where the same. What i did in the green bar is predefining how many elements where going to be in the map. Not a real perfect case for arcemu since you need to know the number of elements prior to inserting them. For the rest the boost unordered_map is slightly faster in inserting then a normal std::map

Second: Looking up 10.000 random integer values (keys, not values)
Posted Image


Here you see a massive improvement over the current std::map way if boost would have been used. This is where the power of hash tables (the trick behind unordered_map) is most visible.

Third: Iterating over the entire unordered_map
Posted Image

Again, boost is slightly faster then a normal map here. Nothing major but with those thousands of maps in arcemu the total speedup will be bigger then this.

edit - What you don't see in here but what i did test was foreach vs the current for method. The above image is done using foreach, not for! The result of that is that foreach vs for (on a std::map) is 2x faster with foreach! I don't know why that is but the numbers show that. (so, in the image above do x2 for the std::map result and you have the for result.

A foreach vs for on a boost unordered_map is 0.1 second faster then then for.

And a std::tr1 unordered_map is over twice as fast with foreach then with for


To sum it up. From these benchmarks it's clear that the unordered_map implementation of visual studio is (in 2008 SP1) just flawed and slow. Boost wins everywhere. If we look at Linux from these benchmarks: http://tinodidriksen...cpp-map-speeds/ you will also see that boost wins it all (except in one case where it's ~2 percent slower with inserts.

My suggestion would be to use boost in arcemu (including foreach for the time being till visual studio 2010 is mainstream which does support auto, 2008 sp1 doesn't support that yet)

And a last note, all the above numbers are made in DEBUG mode! Meaning that in release mode the results will be even faster! Also this was not made within arcemu but made as a seperate test program! Just to test with different containers without the arcemu stuff that slows down compiling and testing.

Regards,
Mark
0

#2 User is offline   Andy_ 

  • Universal Cock Master
  • PipPip
  • Group: Developers
  • Posts: 128
  • Joined: 02-September 08
  • Gender:Male

Posted 30 January 2010 - 02:07 PM

Were these tested on release builds with the parameter checking turned off? I can take any STL class, and write something 100x faster then the standard visual studio implementation, however on full release it's hard to beat it.
0

#3 User is offline   markg85 

  • Advanced Member
  • Pip
  • Group: Members
  • Posts: 45
  • Joined: 09-August 08
  • Gender:Male
  • Location:The Netherlands
  • Interests:C++, linux, open source in general and programming in general.

Posted 30 January 2010 - 02:18 PM

View PostAndy_, on 30 January 2010 - 02:07 PM, said:

Were these tested on release builds with the parameter checking turned off? I can take any STL class, and write something 100x faster then the standard visual studio implementation, however on full release it's hard to beat it.


I made a new clean project in visual studio and used debug mode for all.
And parameter checking..? where is that?
0

#4 User is offline   Andy_ 

  • Universal Cock Master
  • PipPip
  • Group: Developers
  • Posts: 128
  • Joined: 02-September 08
  • Gender:Male

Posted 30 January 2010 - 02:41 PM

The implementations of stl in visual studio have alot of stuff for checking invalid iterators, bounds issues, corruptions and incompatible types. It has horrendus performance on debug mode.

Example (using set here, no SP1 on this machine).

iterator find(const key_type& _Keyval)
{ // find an element in mutable sequence that matches _Keyval
iterator _Where = lower_bound(_Keyval);
return (_Where == end()
|| _DEBUG_LT_PRED(this->comp,
_Keyval, _Key(_Where._Mynode()))
? end() : _Where);
}

#define _DEBUG_LT_PRED(pred, x, y) _DEBUG_LT_PRED_IMPL(pred, x, y, __FILEW__, __LINE__)
#define _DEBUG_LT_PRED_IMPL _Debug_lt_pred


template<class _Pr, class _Ty1, class _Ty2> inline
bool __CLRCALL_OR_CDECL _Debug_lt_pred(_Pr _Pred, const _Ty1& _Left, const _Ty2& _Right,
const wchar_t *_Where, unsigned int _Line)
{ // test if _Pred(_Left, _Right) and _Pred is strict weak ordering
if (!_Pred(_Left, _Right))
return (false);
else if (_Pred(_Right, _Left))
_DEBUG_ERROR2("invalid operator<", _Where, _Line);
return (true);
}
0

#5 User is offline   markg85 

  • Advanced Member
  • Pip
  • Group: Members
  • Posts: 45
  • Joined: 09-August 08
  • Gender:Male
  • Location:The Netherlands
  • Interests:C++, linux, open source in general and programming in general.

Posted 30 January 2010 - 02:45 PM

View PostAndy_, on 30 January 2010 - 02:32 PM, said:

The implementations of stl in visual studio have alot of stuff for checking invalid iterators, bounds issues, corruptions and incompatible types. It has horrendus performance on debug mode.


Oke, that. I did attempt to turn it off in debug but didn't notice any change so i left it the way the defaults are.
So to be perfectly clear. Every single benchmarked container is initialized with default values! no performance tricks are applied!

And i have made a new benchmark, this time in release mode! Now boost beats the current std::map even more with unordered_map :) And the visual studio implementation is still the slow dog. Even in release more! (-O2 compiler flag which visual studio appends by default in release) There is however a interesting twist in the release graph..
Posted Image

note the iterate benchmark. Somehow that is greatly improved in release mode for std::map but not so much for boost or tr1 resulting in std::map to win something as well :blink:

As for the code (so you can see what i did):
#include <iostream>
#include <functional>
#include <unordered_map>
#include <map>
#include <stdio.h>
#include <stdlib.h>
#include "hr_time.h"

// Boost includes
#include <boost/foreach.hpp>
#include <boost/unordered_map.hpp>
#include <boost/typeof/typeof.hpp>

// use a "sane" foreach without caps
#define foreach         BOOST_FOREACH
#define reverse_foreach BOOST_REVERSE_FOREACH

#define my_for(VAR, COL)                                                                                   \
    BOOST_FOREACH_PREAMBLE()                                                                                      \
    if (boost::foreach_detail_::auto_any_t BOOST_FOREACH_ID(_foreach_col) = BOOST_FOREACH_CONTAIN(COL)) {} else   \
    if (boost::foreach_detail_::auto_any_t BOOST_FOREACH_ID(_foreach_cur) = BOOST_FOREACH_BEGIN(COL)) {} else     \
    if (boost::foreach_detail_::auto_any_t BOOST_FOREACH_ID(_foreach_end) = BOOST_FOREACH_END(COL)) {} else       \
    for (bool BOOST_FOREACH_ID(_foreach_continue) = true;                                                         \
              BOOST_FOREACH_ID(_foreach_continue) && !BOOST_FOREACH_DONE(COL);                                    \
              BOOST_FOREACH_ID(_foreach_continue) ? BOOST_FOREACH_NEXT(COL) : (void)0)                            \
        if  (boost::foreach_detail_::set_false(BOOST_FOREACH_ID(_foreach_continue))) {} else                      \
        for (BOOST_AUTO(VAR, BOOST_FOREACH_DEREF(COL)); !BOOST_FOREACH_ID(_foreach_continue); BOOST_FOREACH_ID(_foreach_continue) = true)


int main()
{
	int testlist[10000];
	for(int i = 0; i < 10000; i++)
	{
		testlist[i] = rand() % 1000000 + 1;
	}

	//std::tr1::unordered_map<int, int> test;
	//boost::unordered_map<int, int> test;
	std::map<int, int> test;

	CStopWatch timer;
	timer.startTimer();

	for(int i = 0; i <= 1000000; i++)
	{
		//std::cout << rand() % 1000000 + 1 << std::endl;
		test.insert(std::make_pair( i, (rand() % 1000000 + 1) ));
	}

	timer.stopTimer();
	std::cout << "Insert : " << timer.getElapsedTime() << std::endl;
	
	timer.startTimer();

	for(int i = 0; i < 10000; i++)
	{
		test.find(testlist[i])->second;
	}
	
	timer.stopTimer();
	std::cout << "Lookup : " << timer.getElapsedTime() << std::endl;

	// iterating through all with boost foreach
	timer.startTimer();

	my_for(p, test)
	{
		p.second;
	}

	timer.stopTimer();
	std::cout << "Foreach loop : " << timer.getElapsedTime() << std::endl;
	
	// same, only with the "normal" iterator
	timer.startTimer();

	//std::tr1::unordered_map<int, int>::iterator iter;
	//boost::unordered_map<int, int>::iterator iter;
	std::map<int, int>::iterator iter;
	for(iter = test.begin(); iter != test.end(); ++iter)
	{
		iter->second;
	}

	timer.stopTimer();
	std::cout << "For loop : " << timer.getElapsedTime() << std::endl;
	
	system("pause");
	
	return 1;
}


Note the:

	//std::tr1::unordered_map<int, int> test;
	//boost::unordered_map<int, int> test;
	std::map<int, int> test;


and
	//std::tr1::unordered_map<int, int> test;
	//boost::unordered_map<int, int> test;
	std::map<int, int> test;


Uncomment the one you want to test in both code sections and always have ONE uncommented.

As for the timer ("stolen" from some site)
"hr_time.h"
#include <windows.h>

typedef struct {
    LARGE_INTEGER start;
    LARGE_INTEGER stop;
} stopWatch;

class CStopWatch {

private:
    stopWatch timer;
    LARGE_INTEGER frequency;
    double LIToSecs( LARGE_INTEGER & L) ;
public:
    CStopWatch() ;
    void startTimer( ) ;
    void stopTimer( ) ;
    double getElapsedTime() ;
};


"hr_time.cpp"
#include <windows.h>

#ifndef hr_timer
#include "hr_time.h"
#define hr_timer
#endif

double CStopWatch::LIToSecs( LARGE_INTEGER & L) {
    return ((double)L.QuadPart /(double)frequency.QuadPart) ;
}

CStopWatch::CStopWatch(){
    timer.start.QuadPart=0;
    timer.stop.QuadPart=0; 
    QueryPerformanceFrequency( &frequency ) ;
}

void CStopWatch::startTimer( ) {
    QueryPerformanceCounter(&timer.start) ;
}

void CStopWatch::stopTimer( ) {
    QueryPerformanceCounter(&timer.stop) ;
}

double CStopWatch::getElapsedTime() {
    LARGE_INTEGER time;
    time.QuadPart = timer.stop.QuadPart - timer.start.QuadPart;
    return LIToSecs( time) ;
}


That is sadly a windows only timer.
0

#6 User is offline   Andy_ 

  • Universal Cock Master
  • PipPip
  • Group: Developers
  • Posts: 128
  • Joined: 02-September 08
  • Gender:Male

Posted 31 January 2010 - 12:09 AM

Have you looked at any of the code to see if it's optimized out?

for(int i = 0; i < 10000; i++)
{
test.find(testlist[i])->second;
}


Optimizer would probably just cut that out, it clearly does nothing.

Also since you run it once for each turn, the random elements are completely different, which can lead to collisions.

Just some quick tests I did.

VS2008:
Map sequential insert (500000 elements) : 0.171768s
UMap sequential insert (500000 elements) : 0.136773s
Map random insert (old data, high collide) (500000) : 0.476260s
UMap random insert (old data, high collide) (500000) : 0.333125s
Map clear time (500000 elements): 0.160050s
UMap clear time (500000 elements): 0.186426s
Map random insert (500000) : 0.418371s
UMap random insert (500000) : 0.252036s
Map iterator (499959 iterations) : 0.042272s
UMap iterator (499959 iterations) : 0.043752s
Map reverse iterator (499959 iterations) : 0.049699s
UMap reverse iterator (499959 iterations) : 0.043536s

VS2010:
Map sequential insert (500000 elements) : 0.191924s
UMap sequential insert (500000 elements) : 0.190269s
Map random insert (old data, high collide) (500000) : 0.485608s
UMap random insert (old data, high collide) (500000) : 0.177745s
Map clear time (500000 elements): 0.146524s
UMap clear time (500000 elements): 0.082503s
Map random insert (500000) : 0.485650s
UMap random insert (500000) : 0.244795s
Map iterator (499975 iterations) : 0.041142s
UMap iterator (499975 iterations) : 0.015827s
Map reverse iterator (499975 iterations) : 0.042737s
UMap reverse iterator (499975 iterations) : 0.016029s

Obviously the randomness plays a part too, you can't really take much away from that, except unordered maps are a shittone quicker to iterate.
0

#7 User is offline   markg85 

  • Advanced Member
  • Pip
  • Group: Members
  • Posts: 45
  • Joined: 09-August 08
  • Gender:Male
  • Location:The Netherlands
  • Interests:C++, linux, open source in general and programming in general.

Posted 31 January 2010 - 08:39 AM

View PostAndy_, on 31 January 2010 - 12:09 AM, said:

Have you looked at any of the code to see if it's optimized out?

for(int i = 0; i < 10000; i++)
{
test.find(testlist[i])->second;
}


Optimizer would probably just cut that out, it clearly does nothing.

Also since you run it once for each turn, the random elements are completely different, which can lead to collisions.

Just some quick tests I did.

VS2008:
Map sequential insert (500000 elements) : 0.171768s
UMap sequential insert (500000 elements) : 0.136773s
Map random insert (old data, high collide) (500000) : 0.476260s
UMap random insert (old data, high collide) (500000) : 0.333125s
Map clear time (500000 elements): 0.160050s
UMap clear time (500000 elements): 0.186426s
Map random insert (500000) : 0.418371s
UMap random insert (500000) : 0.252036s
Map iterator (499959 iterations) : 0.042272s
UMap iterator (499959 iterations) : 0.043752s
Map reverse iterator (499959 iterations) : 0.049699s
UMap reverse iterator (499959 iterations) : 0.043536s

VS2010:
Map sequential insert (500000 elements) : 0.191924s
UMap sequential insert (500000 elements) : 0.190269s
Map random insert (old data, high collide) (500000) : 0.485608s
UMap random insert (old data, high collide) (500000) : 0.177745s
Map clear time (500000 elements): 0.146524s
UMap clear time (500000 elements): 0.082503s
Map random insert (500000) : 0.485650s
UMap random insert (500000) : 0.244795s
Map iterator (499975 iterations) : 0.041142s
UMap iterator (499975 iterations) : 0.015827s
Map reverse iterator (499975 iterations) : 0.042737s
UMap reverse iterator (499975 iterations) : 0.016029s

Obviously the randomness plays a part too, you can't really take much away from that, except unordered maps are a shittone quicker to iterate.


i just KNEW ayyone was going to start about the randomness playing a part.
No, it's NOT playing a part since that random code part is done for every map so the slowdown it has with that is added to every map making the test fair again.

I doubt there will be collisions since i also make a array that stores 10.000 random integers and i use that array to do key lookups in the tested map thus no collisions will occur there.

Anyway, the fact remains that unordered_map is faster then map. You seem to be testing the unordered_map from visual studio but the boost one is even faster.

So, what do you say? Replace all maps with unordered_maps?
By now i also did a unordered_set benchmark (again boost wins) and is ~2x faster with inserts, ~4x faster with lookups and ~2x faster with iteration thus faster in all cases. So, all std::set things should be replaced wuth boost::unordered_set if you ask me :blink:

Again, my unordered_map benchmarks are very representative since they use <int, int> which is a case that arcemu seems to use a lot as well. Anything that gets optimized out in my benchmark is with all others as well thus the test remains fair.
0

#8 User is offline   Andy_ 

  • Universal Cock Master
  • PipPip
  • Group: Developers
  • Posts: 128
  • Joined: 02-September 08
  • Gender:Male

Posted 31 January 2010 - 12:58 PM

No, the fact the collisions are random doesn't make it fair.

Run 1 with unordered map might collide 2000 times, run 2 with boost map may collide 20 times. The random keys change, they should always be the same (my program uses the same set of random keys for each container).

Btw can you give me a release copy of your program (with the pdb), so I can disassemble and analyze it.
0

#9 User is offline   markg85 

  • Advanced Member
  • Pip
  • Group: Members
  • Posts: 45
  • Joined: 09-August 08
  • Gender:Male
  • Location:The Netherlands
  • Interests:C++, linux, open source in general and programming in general.

Posted 02 February 2010 - 04:15 PM

View PostAndy_, on 31 January 2010 - 12:58 PM, said:

No, the fact the collisions are random doesn't make it fair.

Run 1 with unordered map might collide 2000 times, run 2 with boost map may collide 20 times. The random keys change, they should always be the same (my program uses the same set of random keys for each container).

Btw can you give me a release copy of your program (with the pdb), so I can disassemble and analyze it.


full source (4kb): http://www.filedropp...orderedmaptest2
You will have to install boost first.
0

#10 User is offline   Dzjhenghiz 

  • حشّاشين
  • Group: Contributor
  • Posts: 1,936
  • Joined: 07-June 08
  • Gender:Male
  • Location:2nd stone after the 3rd oase western sahara
  • Interests:M.C.S.E :- Minesweeper Consultant & Solitaire Expert
  • Server OS:Other

Posted 04 February 2010 - 05:22 PM

so there are posts beeing killed ??????
:blink:

TRASHED ?¿ <_<

This post has been edited by Dzjhenghiz: 04 February 2010 - 05:24 PM

Need Help With Arcemu ? ^^
Posted Image
Concordia res parvae crescunt In varietate concordia
Spoiler

0

#11 User is offline   Paul 

  • Member
  • Pip
  • Group: Members
  • Posts: 56
  • Joined: 09-July 08

Posted 05 February 2010 - 01:23 AM

probably because they had no benefit to the discussions taking place.
0

#12 User is offline   Hasbro 

  • Project Manager
  • PipPipPipPipPipPipPipPipPip
  • Group: Administrator
  • Posts: 2,526
  • Joined: 07-June 08
  • Gender:Male
  • Location:New York
  • Server OS:Windows

Posted 05 February 2010 - 01:46 PM

View PostPaul, on 05 February 2010 - 01:23 AM, said:

probably because they had no benefit to the discussions taking place.

0

#13 User is offline   ZymoticB 

  • Newbie
  • Group: Members
  • Posts: 9
  • Joined: 25-July 11
  • Gender:Male
  • Server OS:Darwin/OS X

Posted 26 July 2011 - 08:06 AM

Not that it will make a huge difference in this benchmark but be careful with that post-fix operator. It will make a copy of the variable at the end of each iteration of the for loop. You are using a prefix with the iterator but not for insert or lookup. Again this may not make a large change; most likely drop each test almost equivalently. It is however a good practise to almost never use a post-fix , i.e. i++ unless you are really sure you need a copy of that variable to be in use until the end of the instruction and therefore never in a for loop since the increment gets evaluated last so you have the value of i you need throughout the iteration of the loop.

Also you made a testlist to move the modulus out of the benchmark code ... and then never used it ?

Anyways I mostly wanted to comment on the pre/post - fix since it is a very simple optimization; just in case anyone reads this, I understand this benchmark was done in 2010 but no point making a thread about pre/post-fix increments.
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users