Game Development Community

[BUG] SimObjectPtr + Vector + VectorResize = heap corruption? - LOGGED

by Manoel Neto · in Torque 3D Professional · 05/19/2010 (11:35 am) · 15 replies

Hunting down a new strange heap corruption when closing our game (only in release, ha!), I nailed it down to SimObjectPtr usage (again). This time, it has to do with the SimObject reference system. Here's the problem:

1) Create a Vector that stores SimObjectPtr (or structs with a SimObjectPtr in them).
2) Add some SimObjectPtr to said Vector until the Vector needs to call VectorResize() internally because it ran out of buffer space.
3) Delete any SimObject pointed by the SimObjectPtrs in the Vector.
4) Heap corrupted!

This happens because when assigning a SimObject to a SimObjectPtr, the SimObjectPtr will call registerReference() on the SimObject, and pass a pointer to it's pointer to that SimObject. When you delete that SimObject, it will go through all its references (which are pointers to pointers) and nullify them, so every SimObjectPtr pointing to the deleted object will get its internal pointer nullified (which is the entire reason behind their existence).

The problem is that when you resize a Vector, new memory will be allocated to hold the new Vector size and all the contents will be copied to the new location. If said memory contains SimObjectPtrs, their internal pointers will also be moved during the realloc() but the SimObjects they point to will keep the pointer to the old location.

I found this problem going in in the PersistenceManager, but I still need to pinpoint the exact culprit. However, once I find it the solution should be simple:

Never store SimObjectPtr<> directly inside a Vector or anywhere it can be moved from its original memory locations! Use SimObjectPtr<>* instead (using new and delete to create/destruct it) so the Vector can resize without messing with the SimObjectPtr<> contents.

#1
05/19/2010 (12:33 pm)
In the PersistenceManager, this is the offender:

struct DirtyObject
{
  SimObjectPtr<SimObject> object;
  StringTableEntry fileName;

  DirtyObject()
  {
	 object   = NULL;
	 fileName = NULL;
  }
};
There's a Vector full of those:
typedef Vector<DirtyObject> DirtyList;

The fix involves lots of small changes, but it boils down to replacing the struct with this:

struct DirtyObject
{
  SimObjectPtr<SimObject> *object;
  StringTableEntry fileName;

  DirtyObject()
  {
	 object = new SimObjectPtr<SimObject>();
	 fileName = NULL;
  }
  ~DirtyObject()
  {
	 SAFE_DELETE( object );
  }
};
And then fixing all the resulting compilation errors in PersistenceManager.cpp (which are most fixed by adding "*" in some places and replacing some "." by "->").
#2
05/30/2010 (11:22 am)
bug logged Key:TQA-168
#3
08/01/2010 (7:34 pm)
Thanks for the fix Manoel. It will be in beta 2.
#4
08/02/2010 (6:54 am)
Ahem, Manoel, referring to Meyers it should be possible to fix it with an copy operator and constructor:
struct DirtyObject
{
   SimObjectPtr<SimObject> object;  
   StringTableEntry fileName;  
   DirtyObject()  
   {  
      object   = NULL;  
      fileName = NULL;  
   }
   DirtyObject( const DirtyObject &other ) : 
      object( other.object ), fileName( other.fileName )
   {}
   DirtyObject operator=( const DirtyObject &other )
   {
      object = other.object;
      fileName = other.fileName;
      return *this;
   }
};

Your fix works, but on cost of an additional new/delete struct, which is not the intend of the SimObjectPtr, isn't it?

The original problem is, that at increasing the vector the elements are copied. Because no copy constructor/operator was written the system builds an own copy operator, copying the memory. By writing the operators the memory should be copied correctly and everything is fine.

I know it's also a solution, but code style should be easy.

Just my 2cents.

edit/ see the missing semis ;-)
#5
08/02/2010 (1:01 pm)
@Tobias: did you test that code? I'm asking because the Torque Vector<> template's resize function doesn't seem to invoke copy constructors:

bool VectorResize(U32 *aSize, U32 *aCount, void **arrayPtr, U32 newCount, U32 elemSize)
{
   if (newCount > 0)
   {
      U32 blocks = newCount / VectorBlockSize;
      if (newCount % VectorBlockSize)
         blocks++;
      S32 mem_size = blocks * VectorBlockSize * elemSize;
      *arrayPtr = *arrayPtr ? dRealloc(*arrayPtr,mem_size) :
         dMalloc(mem_size);

      *aCount = newCount;
      *aSize = blocks * VectorBlockSize;
      return true;
   }

   if (*arrayPtr) 
   {
      dFree(*arrayPtr);
      *arrayPtr = 0;
   }

   *aSize = 0;
   *aCount = 0;
   return true;
}
It just moves the memory around using dRealloc, which will screw up any types which are aware of their own memory location (SimObjectPtr and StrongRef). However, doing so would degrade Vector<> performance, so maybe the solution would be disallowing these types from being stored in Vector<> somehow.
#6
08/03/2010 (5:45 am)
Just took over the regiment by Tobias,

Hi Manoel,

haven't seen the code yet. I will have to testify your posted comment. Tobias said he hasn't tested the code, so it will be my turn then.
Does it really matter if realloc turns memory for SimObjectPtr? For self registering Objects you´re right. Propably it makes no sense to take a deeper look, if an easy workaround helps.

Give me some time to check with an example.
Thanks in advance,
Thomas
#7
08/03/2010 (5:15 pm)
Just a brief on how SimObjectPtr works:

The SimObjectPtr contains a pointer to a SimObject. You you assign a SimObject to a SimObjectPtr, it will calls registerReference() on the SimObject and passing a pointer to the SimObjectPtr's SimObject pointer. The SimObject then stores this pointer-to-pointer in list.

When the SimObject is deleted, it goes through that list and clear the content of the pointers stored there to null.

So, a SimObject stores a pointer to a member of each SimObjectPtr that references it. If that SimObjectPtr is moved to a new memory location, the SimObject will not know about it and will keep a pointer to the SimObjectPtr's memory location, which will cause memory corruption when the SimObject is deleted and tries to nullfly it (probably overwriting unrelated data that got allocated to that memory address).

Boost's shared_ptr has the same limitation: you cannot move it to new memory locations without destructing and recreating it.
#8
08/04/2010 (5:40 am)
Hi Manoel,

you´re right with the things you wrote. Honestly I cannot see, where the reference to the SimObjectPtr has been used...

/deleteTheRestExplanation

Okay, *thumbup* you´re absolutely right. The argument of registerReference is the member of the SimObjectPtr. This will end in heap corruptness if you call realloc...

Only way to deal this is like you descriped.

Sorry for our blindness... ;-)

#9
08/04/2010 (10:34 am)
A quite funny idea I had in the last minutes, is to give the SimObjectPtr an additional reference, to self check if the memory area has changed. If reference changed give the pointer object the ability to update itsself.
This could be implemented for vector use and an empty statement otherwise. I will give an example code later on.
#10
08/04/2010 (1:34 pm)
The problem is: when would the self-move check be performed? The SimObjects don't know anything about who owns the pointers which reference it (it just nulls them), and thus cannot call methods on them (and they would break the same since the pointer owner got moved anyway).

The only way I see is defining a specific implementation for Vector<SimObjectPtr<T>>::resize(), and having that one construct the SimObjectPtrs at the new location and destruct them from the old location. I haven't got the chance to test it myself, and I don't know if you can create specialized implementations for template types.
#11
08/05/2010 (5:34 am)
You are right, the SimObjects don´t know anything about their pointer referencers. So the ObjectPtr has to do the work.
Your suggestion with the Vector<SimObjectPtr<T> > specialisation will not really help you (beside this would be the smallest problem for me), because like in the original post, the SimObjectPtr can be member of a struct/class and this will not caught. I was quite busy yesterday, will try to give the source snippet today!
#12
08/05/2010 (12:42 pm)
Yeah, I forgot about SimObjectPtr members. Those would go unnoticed.

The problem with the self check is that there's no guarantee it can be executed before the SimObject is deleted...

Wait! I have an idea: what about making the SimObject check the content of the reference pointers before it cleans them? Each pointer must always contain the address of the SimObject itself, and if it doesn't it means whoever owns that pointer got moved/erased/corrupted and it shouldn't be cleaned, just removed from the reference list.

If we combine this with having the SimObjectPtr verify its own location when being de-referenced, it becomes much easier to detect this situation.

However, the self-check should be done only in Debug (since it would affect SimObjectPtr de-referencing performance) and it should trigger a fatal assertion instead of trying to fix things (so the programmer can find out where the problem is and change the code to not move SimObjectPtrs around).
#13
08/05/2010 (12:52 pm)
Manoel, fantastic idea. Because if the memory is used by another pointer it has been illegal initialized or not been used. This could give you the chance to catch up the situation.

Another thing you have to take in account is, that we establish a check in SimObjectPtr, swap memory, delete the Pointer and then try to update the memory position.

You´re right, this has to lead in an fatal error and cannot easily corrected by code!

However my plan is to give a template where you can choose which profile you use, fast, not controlled or checked. You can decide to use it in Debug or Release then. It´s just compiler flags!
#14
08/26/2010 (4:58 pm)
Sorry, coming back to this thread. Do we still work for a fix, or do we take the new system from Beta2? I´m quite short in time currently and I haven't got back to this issue yet!?
#15
08/26/2010 (9:33 pm)
I believe the new SimObjectPtr in beta2 is actually safe to store in vectors, since it was re-worked to derive from WeakRefBase, which never gave me these kinds of problems. It is a bit more expensive to use, since it takes an extra jump to get to the actual object.