Toggle shoutbox Lastest Announcements
![]() |
Luaengine Crashfix
#1
Posted 03 August 2011 - 05:31 AM
What bug does this patch fix: Crashes in the old Lua Engine caused by dereferencing free'd memory.
Detailed Explanation: The old Lua Engine (Luabridge too actually but this patch will not address any issue related to that) stores references to Object and derived types (Unit/GameObject/Player/Creature) but when these Objects are despawned and deleted by ArcEmu, the Lua Engine still holds these references which now point to free'd memory. The solution me and dfighter chose was to make the Lua Engine store guid/mapId/instanceId instead of Object* so when the Object is deleted, NULL will be returned (or "nil" in the Lua slang). I copypasted ArcLuna<T> and added ArcSole<T> described as "As opposed to ArcLuna, this is meant to bring some light to this Lua Engine plagued by crashes since r1". When getting the Object* from guid/mapId/instanceId it first tries to get MapMgr* from mapId with sInstanceMgr.GetMapMgr(mapId). If NULL is returned, it tries with sInstanceMgr.GetInstanceByIds(mapId, instanceId). If NULL is returned again it means the map doesn't exist anymore and NULL will be returned (or "nil"). Once the MapMgr reference is retrieved, MapMgr::_GetObject(guid) is called and the result is casted to the template type T. __gc metamethod is set to nil instead of the old "let's delete the pushed Object" behaviour of ArcLuna because it's not a business of the Lua Engine to delete Objects once they have no more references in Lua Script; they will be deleted by a call to Despawn(), when the MapCell is unloaded or by some other function in arcemu-world. This was probably causing crashes in ArcLuna, too (and probably still causes them, let's wait for a crash report just to be more sure and to fix 1 thing at time ).
How to reproduce: In my journey thought the Lua Engine I had to learn how to be a ril pro Lua scripter (and this means I learnt how to copypaste parts of scritps found around and make them what I wanted to do). Here's a easy crashy script that will work just fine after applying this patch. http://pastebin.com/1vQL80D3 (updated after gossip changes)
Link to thread(s) in bug reports section if available: https://sourceforge....cemu/ticket/129 and probably hundreds of other crash reports since r1
Patch: http://pastebin.com/hxMgGkUK (updated after a crash when the __gc metamethod was called)(updated after Andy's tabs hurricane)
Tested:
- storing a ref to a Creature in a global variable, despawning the Creature and accessing the global variable.
- storing a ref to a GameObject in a global variable, despawning the GameObject and accessing the global variable.
- at Stormwind (continent)
- at Black Temple (instance)
- ran on valgrind, no memory issues, no memory leaks (related to this patch)
- Lua functions Unit:SendChatMessage()/Unit:SpawnCreature()/Unit:SpawnGameObject()/Unit:FullCastSpellOnTarget()/gossip functions
Untested:
- Items
- Pets
Still to implement:
- Auras (should be stored as target (so as guid/mapId/instanceId) + something like aura slot)
- Items (I think they have some kind of guid, too)
- Spells
- missed anything?
Please provide feedback, testing Lua scripts (for the OLD Lua Engine) and report what this patch breaks (unless it was already broken before or the same script causes a crash without this patch).
#2
Posted 03 August 2011 - 06:48 AM
Edit:
Everything appears to be working, I noticed how when using flight paths now the units don't start moving to waypoints etc when you load them if the location of the waypoint is not close enough to have loaded yet. This is fine, and I guess the fix you did.

SetPlayerLock(0) (or false) no longer works, but I guess this is just something that hasn't been updated in a ArcEmu commit recently. I just updated ~100 revs. I'll look into it.
I'll provide more information when and if it crashes again.
#3
Posted 03 August 2011 - 11:54 AM
salamanda, on 03 August 2011 - 06:48 AM, said:
Could you please test that function without my patch to see if it works? If it doesn't work either, please submit it as a ticket on our tracker

Nice to know your players can fly safely now using flypaths

#4
Posted 04 August 2011 - 06:47 AM
jackpoz, on 03 August 2011 - 11:54 AM, said:

Nice to know your players can fly safely now using flypaths

Yes, I tested it on a version without the patch and it still doesn't work so I have created a ticket (https://sourceforge....cemu/ticket/152).
Since this is a pretty vital function for my gameplay functionality, I can't test the patch on my live realm currently with players. :/ However, testing it on localhost and running through all my scripts, playing it as players would has not caused any crashes when originally it would have caused 1-2 crashes by the time the content was exhausted (~10 hours worth).
#5
Posted 04 August 2011 - 01:36 PM

Anyone else up to test this patch? Where are the hordes of Lua scripters

#6
Posted 05 August 2011 - 07:07 AM
jackpoz, on 04 August 2011 - 01:36 PM, said:

Anyone else up to test this patch? Where are the hordes of Lua scripters

Hurp durp.
/facepalm
I could of sworn I tried that. =/ That's what late nights do to you. Tested, working. Experienced a single crash with IsInPhase and the bad pointers, but I think this was a random crash out of the blue as I was unable to reproduce it. Updating my scripts then transferring to my dedi to test in a live environment.
The hordes of Lua scripters don't come to the ArcEmu forums, lol. They go on ac-web and spam people on why there copy and pasted scripts no werk.
edit:
Once I activated this on the live realm I started getting the same sort of issues with the flight paths.

#7
Posted 06 August 2011 - 05:11 AM
Tested : same Lua script of first post, ran on valgrind (no memory issues or leaks related to the changes), at Stormwind and Black Temple, deleted with .npc delete the Creature that had the gossip script: everything looks fine.
Untested: My poor knowledge of the Lua Engine (but most likely my laziness) didn't let me test the behaviour of ArcSole when __gc is called, even if nothing should happen has it was set to nil.
Feedback is most welcome

#8
Posted 06 August 2011 - 06:56 AM
Also, I believe your current implementation could be improved performance-wise. While the method here will work (for Object and derivatives) it will be doing these checks for existance in the world an awful lot. In this implementation Get() will be called every time an object is accessed through a C++ function (including simple Unit:DoThis() syntax, and whenever an Object is an argument to a C++ function).
For example this lovely gossip script you made earlier http://pastebin.com/BQVyKF6r , whenever the OnGossip function is executed the Get() method will be called 9 times at least!
Quote
Unit:GossipCreateMenu(100, player, 0)
Unit:GossipMenuAddItem( 0, "Test chat msg", 1, 0)
Unit:GossipMenuAddItem( 0, "Crash Part 1", 2, 0)
Unit:GossipMenuAddItem( 0, "Crash Part 2", 3, 0)
Unit:GossipSendMenu(player)
end
And, while Get() is a constant time function, I think the lua engine is slow enough.
I made a solution in luabridge (just because it was much easier to implement there rather than lua engine) to the problem without the storing of GUIDs approach. How it works is, when any userdata is pushed, a reference to the pushed object and the userdata is stored in a hash_multimap. A new instance hook is used to detect destructors of Spell/Aura and RemoveFromWorld on Objects. When that hook fires the lua engine searches for any stored userdata references for that object, and sets those values to null. Then when check() is called it just has to check that the userdata is not null instead of checking if the map/instance/guid exists

I think this method can be applied to lua engine too but I'll look at that tomorrow. I hope you enjoy this feedback

#9
Posted 06 August 2011 - 07:12 AM

edit: updated ticket with another crash - similar error
#10
Posted 06 August 2011 - 09:22 AM
hypersniper, on 06 August 2011 - 06:56 AM, said:
https://sourceforge....cemu/ticket/129
world.exe!Arcemu::Util::ARCEMU_ASSERT(bool condition) Line 42 + 0x6 bytes C++ world.exe!Object::~Object() Line 80 + 0x19 bytes C++ world.exe!Unit::~Unit() Line 676 + 0x11d bytes C++ world.exe!Creature::~Creature() Line 249 + 0x21 bytes C++ world.exe!Creature::`vector deleting destructor'() + 0x57 bytes C++ LuaEngine.dll!LuaEngine::ArcSole<Unit>::gc_T(lua_State * L) Line 889 + 0x22 bytes C++ LuaEngine.dll!luaD_precall(lua_State * L, lua_TValue * func, int nresults) Line 319 + 0x16 bytes C LuaEngine.dll!luaD_call(lua_State * L, lua_TValue * func, int nResults) Line 376 + 0x11 bytes C
Yes I'm sure as there's a callstack right here saying LuaEngine::ArcSole<Unit>::gc_T was called and tried to delete an in-world Creature (that's what triggered the assert).
hypersniper, on 06 August 2011 - 06:56 AM, said:
If you take a look at my patch you can see ArcSole is used only for Object derived types and they don't need any kind of garbage collector method.
hypersniper, on 06 August 2011 - 06:56 AM, said:
I'll start even remotely thinking about performance ONLY when crashes will be fixed. It's NOT very useful a fast scripting engine that crashes with just 1 simple gossip script.
hypersniper, on 06 August 2011 - 06:56 AM, said:
Yeah, I know (that's the whole point of this patch, never store references to Object types but always get them with GUID/mapid/instanceid).
hypersniper, on 06 August 2011 - 06:56 AM, said:
and how much time will a modern CPU take to execute those 9 calls?
hypersniper, on 06 August 2011 - 06:56 AM, said:
Define "slow enough": if it's already too slow, then it should be removed. If it's at the limit between acceptable and too slow, then it's the fault of who wrote it before me. I'm a Mechanical Engineering student so I'm used to see graphs, tables and math equations about what lead to a conclusion. Could you please post any of those about your "slow enough" conclusion?
hypersniper, on 06 August 2011 - 06:56 AM, said:
That's unrelated to this thread as the old Lua Engine is single-threaded with locks everywhere while luabridge has 1 instance for each MapMgr.
hypersniper, on 06 August 2011 - 06:56 AM, said:
But but, if I cast a Spell that applies an AoE Aura and if there are 9 enemies, that hook will be called 9 times! Even if I don't have any Lua script! Wouldn't this slow down the luabridge performance?
On a more serious note, that's another approach to solve the issue. As my knowledge of the Lua Engine is very limited and I had very little time to find a solution (I think the Lua Engine caused enough crashes already in the last years, thank to the incompetent developers who couldn't find these issues for reasons that are still unknown to me), I (with dfighter) came up with the "usual" GUID solution, used in AIInterface too. If you can post a patch in the format described at http://arcemu.org/fo...?showtopic=2355 (even in this topic is fine, as this was more like a Patch Workshop) then we may find a proper solution to this crash issues.
As for the feedback, that's always welcome, especially from who knows Lua engines better than me.
#11
Posted 06 August 2011 - 06:02 PM
Well your call stack is convincing, I was just skeptical because it was always working perfectly earlier and such a system is widely used even in vanilla Lunar ( http://lua-users.org...indingWithLunar ).
When I said "slow enough" I was saying that the lua engine is slow compared to some other scripting solutions (no I don't have any tables or equations for this) and further slowing it down on a per-instruction basis may not be the best solution. So I suggested another. Implemented on luabridge? yes. Unrelated to this? no.
While you are correct about the 9 auras thing, what I was trying to illustrate is that there is a way to do it without doing (map/instance/guid) checks on a per-lua-instruction basis.
#12
Posted 06 August 2011 - 06:13 PM
hypersniper, on 06 August 2011 - 06:02 PM, said:
You two are not fighting. This is a debate about possible solutions to a problem, at most. Which is welcome on this forum.
We are not those other forums, were you either have to kiss up to the poster or be silent, otherwise you are branded a flamer or troll, so feel free to post your input.
As long as it's not personal insults, criticism is always welcome.
#13
Posted 08 August 2011 - 07:17 AM
hypersniper, on 06 August 2011 - 06:02 PM, said:
I think you missed the last sentence of my (quite long) post:
jackpoz, on 06 August 2011 - 09:22 AM, said:
As you can see, I took your feedback so seriously that I even replied to every sentence you wrote (took me 20 minutes to do that and I even had to tell a friend to wait because I was busy). I was just professional (or at least tried) because World Of Warcraft is the game, not ArcEmu, and these Lua Engine crashes are not fun at all (some folks say ArcEmu is a hobby but the definition of hobby at http://dictionary.re...om/browse/hobby is "an activity or interest pursued for pleasure or relaxation" and I don't get any pleasure seeing my hard work on fixing crashes becoming useless because of some old mistakes of other ppl).
hypersniper, on 06 August 2011 - 06:02 PM, said:
This is the typical trade-off scenario: we can choose to slow down the Lua Engine (my solution) or slow down the whole arcemu-world (your solution). Also remember that the old Lua Engine (LHA) is single-threaded so having a container accessed by different threads (each MapMgr thread) when an Object is removed from world/~Spell/~Aura will require some locking and that will pretty much make the WHOLE arcemu-world single-threaded. In luabridge this may work but this thread is about the old Lua Engine. Let's keep the focus to one engine at once

patch updated after Andy's tabs hurricane, test Lua gossip script too after gossip changes.
#14
Posted 28 August 2011 - 11:29 AM
hypersniper, on 06 August 2011 - 06:56 AM, said:

any chance to see a patch on another thread with these changes?
#15
Posted 04 October 2011 - 02:16 PM